Methods, apparatus and computer program products for formulating culture media

ABSTRACT

Methods, apparatus and computer program products are provided for identifying a component of culture medium based on the parameters (e g, physical, chemical, biological and/or topological characteristics) of compounds from within a compound library. In preferred embodiments, the compound library is a peptide library. Also provided are methods of selecting a compound library from a larger compound space based on whole molecule (i.e., global) parameters of the compounds. Preferably, this method is practiced in conjunction with a method of identifying a component of a culture medium. Further provided are methods of predicting a biological activity of a peptide based on at least one whole molecule parameter of the peptide. This method finds use in methods of drug discovery, identifying components of culture medium, and identifying and/or designing peptides with particular pharmacological or therapeutic activities.

FIELD OF THE INVENTION

[0001] This invention relates to methods, apparatus and computer programproducts for formulating culture media, and more particularly tomethods, apparatus and computer program products for identifying culturemedia components.

BACKGROUND OF THE INVENTION

[0002] Media used to grow cultured cells for both industrial andclinical applications are usually chemically undefined or, at best,semi-defined. It would be advantageous to employ chemically-definedmedia in cell culture systems; however; most attempts to grow culturedcells exclusively in chemically-defined media have been unsuccessful andresult in a high frequency of cell death and/or poor cell performance.These efforts have failed, at least in part, because those components ofundefined media that promote cell viability and performance remainuncharacterized. Thus, it would be desirable to develop improved methodsof identifying media components that support and enhance cell viabilityand performance.

[0003] Hydrolysates are the most common undefined substance used inbactoriological media today for both clinical and fermentationapplications. Current media optimization frequently starts with theseundefined substances. Hydrolysates are used to replace serum (anotherundefined substance) in mammalian culture (Saha and Sen, (1989) ActaVirol. 33:338).

[0004] The inclusion of tissue and protein hydrolysates in bacterialgrowth media has been practiced since the late 1800s, Retger, (1927) J.Immunology 13:323, provided some of the earliest details on preparingand combining hydrolysates. Retger's data demonstrated that best toxinyields from Corynebacteria diptheriae were obtained with mediumformulated with hydrolysates containing a lower percentage offull-length proteins and a higher concentration of peptides. Kihara etal, (1952) J. Biol. Chem. 197:801, also found that cultured cellsperformed better in medium containing peptides as compared with theconstituent amino acids.

[0005] There are several drawbacks associated with using digests in cellculture media, for example, the range of peptide sequences available forincorporation into culture media is limited by the starting substrateand enzyme or acid used in the digestion. Many of thecurrently-available digests are difficult to reproduce, with significantlot-to-lot variation being commonplace. Digests of tissue obtained froma slaughterhouse are often used as components of cell culture media.These digests are among the most difficult to reproduce since thestarting material and ratios of the starting substrate vary. Caseindigests are more reproducible because milk is a more homogenous sourceof protein than slaughterhouse tissue. However, the range of peptidesequences generated by an enzymatic or acid digestion of casein issmall. Most media manufacturers blend these two types of digest to yielda medium that provides better results than can be achieved with eitherdigest used alone. Unfortunately, the use of blends adds another levelof complexity to the manufacturing process and is also a source oflot-to-lot variability.

[0006] Moreover, both hydrolysates and sera are problematic forpharmaceutical applications, since each potentially harbors pathogens.The undefined nature of hydrolysates and sera also leads to problems inmanufacturing. To illustrate, the high molecular weight components foundin both types of undefined substances create additional downstreamprocessing costs. Furthermore, the undefined nature of hydrolysates andsera leads to lot-to-lot variability. Previous attempts at developingchemically-defined media, however, have generally suffered fromsub-optimal cell performance and unacceptable levels of cell mortality.

[0007] Historically, there have been several approaches employed todetermine the type(s) of nutrients consumed or preferred by cells grownin culture. Ono of the most common practices is post culture analysis,whereby the spent medium is evaluated to identify constituents removedfrom the medium. This approach has only rarely lead to theidentification of compounds that can be isolated for use as a mediumcomponent or can be employed as a benchmark to monitor futurehydrolysate or serum lots. In addition, this approach cannot identifycompounds that function through signaling and are not physically removedfrom the medium.

[0008] Zhao et al., (1996) Appl. Microbiol. Biotechnol. 45:778, comparedthe bacterial growth-stimulating activity of bovine hemoglobin with thatof specific peptides from a peptic hydrolysate of this protein. Aparticular peptide fragment was demonstrated to promote cell growth ofgram negative bacteria to a greater extent than the intact protein. Nosuch enhancement was observed when the constituent amino acids of thisactive fragment were assayed, leading these investigators to suggestthat this peptide fragment did not simply act as an amino acid supplier,but rather interacted with peptide permeases on the cell membrane. Thisstrategy can be employed in other systems as well to provide someinformation as to specific biologically-active peptides produced byhydrolysis of whole proteins.

[0009] More recently, Automated Cell Technologies (ACT; Pittsburgh, Pa.)has suggested that it will use a “combinatorial cell culture” todiscover improved media for growing hematopoietic stem cells ex vivo. InVivo: Bus. Med. Rep. 15:38 (December 1997). Utilizing a 384-wellmicrotiter plate and a robotic pipetting system, ACT proposes to adddifferent growth factor combinations to various mixes of culture mediain an effort to identify a culture medium that will support stem cellgrowth ex vivo; It is unstated whether chemically-defined or undefinedmedia will be used.

[0010] All of the previously-described methods fail to further anunderstanding of the physical, chemical or other properties of themedium components that contribute to the enhanced cell performance inculture. As a result, these methods are inefficient as they fail toprovide a means of predicting and systematically screening additionallead compounds.

[0011] Most of the research on quantitative structure-activityrelationships (QSAR) have focused on small organic molecules inmodicinal and environmental chemistry. Peptides have not been studied asmuch owing to difficulties in developing descriptors for amino acid sidechains. The earliest attempt to quantify amino acids for QSAR was bySneath, (1966) J. Theoretical Biology, 12:157, who developed foursemi-quantitative descriptors for amino acids. Later, Hellberg et al.,(1987) J. Med. Chem. 30:1126, developed a set of principal components,which were derived from twenty-nine measured and theoretical propertiesof amino acids, including molecular weight, isoelectric point, nuclearmagnetic resonance parameters, logP (hydrophobicity), thin layerchromatography, and high performance liquid chromatography parameters.Principle component analysis led to three principal components, whichHellberg et al. called z1, z2 and z3. Theoretically-derived parametershave been developed by Norinder, (1991) Peptides, 12:1223, and Cocchiand Johansson, (1993) Quant. Struct.-Act. Relat. 12:1. In all of theseinstances, parameters were developed for the individual amino acids, butnone were measured on whole peptides.

[0012] Cho et al., (1998) J. Chem. Inf. Comput. Sci. 38:259, used arational drug design approach to identify peptides in a targeted virtuallibrary having bradykinin-potentiating activity. This group identifiedvirtual peptides predicted to be enhanced in amino acid building blockswith bradykinin-like activity based on analysis of a starting set ofpeptides known to possess such activity. These investigators employedtopological indices or physiochemical descriptors of individual aminoacid building blocks from known leads to design a virtual targetedlibrary. Cho et al. also report that computational limitations preventedcomplete analysis of all indices for every virtual peptide within theirtargeted library. Cho et al. did not apply their methods to mediadevelopment.

[0013] Thus, previous attempts to improve culture media have largelyrelied on ad hoc trial-and-error techniques. There remains a need inthe. art for systematic and predictive methods for identifying mediumcomponents to improve cell performance in culture. Moreover, there is aneed in the art for high through-put methods for identifying mediumcomponents.

SUMMARY OF THE INVENTION

[0014] Development of industrial and clinical culture media has beenimpeded, in large part, because of a lack of understanding as to whichcomponents of culture media promote cell performance. To compensate,currently-utilized media generally rely on the inclusion of complex andchemically-undefined substances, such as hydrolysates or serum, tosupport culture performance. The inability to identify and predictdefined compounds for culture media formulations that will enhanceculture performance is problematic, particularly in an industrialmanufacturing setting where low cost, high performance, and reproduciblesystems are critical.

[0015] It is therefore an object of the present invention to providemethods, apparatus and computer program products for identifying culturemedium components and for pairing new culture medium components withestablished media components.

[0016] It is also an object of the present invention to provide methods,apparatus and computer program products for predicting the activity(e.g., biological activity) of compounds from a compound library.

[0017] It is still another object of the present invention to providemethods, apparatus and computer program products for defining a testcompound library from a larger compound space.

[0018] It is yet a further object of the present invention to providemethods, apparatus and computer program products for predicting anactivity of a peptide based on parameters (i.e., descriptors) for wholemolecules, constituent amino acids, or combinations thereof.

[0019] It is still a further object of the invention to provide moreeconomical and rapid methods of identifying compounds with desiredactivities, e.g., for use in culture medium, drug discovery and therapy,and/or diagnostics.

[0020] These and other objects, features and advantages are provided bythe present invention, which utilizes systematic design methods toidentify lead compounds and to predict the structures of additionalleads, e.g., for the formulation of culture media. Moreover, the presentinventors have discovered that whole molecule descriptors of compounds(e.g., total dipole moment, molecular weight) can play an important (andin some cases, primary) role in predicting the activity of a compoundand in more efficient exploration of compound space.

[0021] The media formulated by the present invention can result inimproved products for diagnostic applications, for example, in productswhere the media is a single component such as plated media, or as astand-alone product such as dehydrated culture media or liquid media.The new formulas may also enhance the manufacturing of products culturedin fermenters and bioreactors. In addition, the media formulatedaccording to the present invention can provide an improved environmentfor cell research and drug discovery. In particular, the presentinvention can facilitate the development of culture media to maintainand propagate primary cells and cell lines that have been difficult tomaintain in vitro in traditional media.

[0022] According to one preferred embodiment of the invention, a firsttest library of compounds is evaluated to identify compounds within thelibrary with a desired characteristic(s) for use as a component of aculture medium. A plurality of culture media, each containing arespective test compound(s) from within the first test library, isscreened by measuring an indicia of a property of the culture media.Typically, the indicia of the property is measured from a plurality offirst cell cultures each containing a respective culture mediumcontaining a respective test compound. Illustrative properties of theculture media containing the test compounds that may be measuredaccording to the present invention include the ability to alter (e.g.,induce or enhance, alternatively, suppress or inhibit) the growth ofcultured cells, the ability to alter the production (e.g., at the levelof transcription, translation, post-translational processing,intracellular transport, secretion, turnover, and the like) of apeptide(s) and/or protein(s) (e.g., antigens, toxins, antibodies,hormones, growth factors, cytokines, clotting factors, and enzymes), andthe ability to alter the synthesis and/or secretion of other compoundsincluding but not limited to antibiotics, steroids, carbohydrates,lipids and nucleic acids. Additional properties include the ability toalter (as defined above) the maturation, differentiation, growth and/orproliferation of cells.

[0023] In particular embodiments, a relationship (e.g., a mathematicalrelationship) is determined between at least one parameter or descriptor(e.g., physical, chemical, biological and/or topological parameters) ofthe test compounds from within the first test library which are includedin the plurality of first culture media and the measured indicia of theproperty. The relationship can be used as a predictor to identifyadditional lead compounds as components of culture media that areexpected, based on their parameters, to give indicia of the measuredproperty that satisfy a test requirement. Illustrative parameters thatmay be employed according to the present invention include but are notlimited to molecular weight, charge, isoelectric point, total dipolemoment, isotropic surface area, electronic charge index, andhydrophobicity of the whole molecule (e.g. peptide, oligonucleotide,carbohydrate, lipid ect.) or individual building block (e.g., aminoacid, nucleotide, monosaccharide, triglyceride etc.) in the molecule.Any suitable topological parameter known in the art may be employed,such as those described by L. B. Kier and L. H. Hall, MolecularConnectivity in Structure-activity Analysis, Research Studies Press,John Wiley & Sons, Letchworth England (1986); M. Johnson et al.,Concepts and Applications of Molecular Similarity, John Wiley & Sons,New York (1990); and R. P Sheridan et al., (1995) J. Chem. Inf Comput.Sci. 35:310. The term “parameters” as used herein also encompasses theprinciple components of S. Hellberg et al., (1987) J. Med. Chem. 30:1126(e.g., z1, z2, z3).

[0024] A test requirement is determined against which the measuredindicia of the property are compared. The test requirement may bedetermined a priori or it may be determined before or after operationsto determine a relationship between the parameter(s) of the first testcompounds and the measured indicia of the property of the plurality ofculture media containing the first test compounds. The test requirementmay be determined so that indicia of the property falling above the testrequirement are desirable. Alternatively, the test requirement may bechosen so that indicia of the property falling below the testrequirement are preferred. As a further alternative, the testrequirement may be such that indicia of the measured property that fallwithin a particular range are preferred (e.g., for cell growth, it maybe advantageous to select for growth above a threshold level but below amaximum level, as growth rates above the maximum may adversely affectother aspects of cell performance). Alternatively, the test requirementmay be qualitative, rather than quantitative, in nature.

[0025] The relationship is used to predict the structure of a pluralityof untested compounds each of which, when included as a component of aculture medium, is expected to provide indicia of the property thatsatisfies the test requirement. Operations are also performed toidentify a second test library containing a plurality of second testcompounds as components of a plurality of second culture media. Theplurality of second culture media, which each contain a respective testcompound from within the second test library, are predicted to provideindicia of the property that satisfy the test requirement based on therelationship determined between the parameter(s) of the first testcompounds and the first indicia of the measured property.

[0026] Optionally, and preferably, the plurality of second testcompounds will include at least one compound that was not among the setof first test compounds. It is also preferred that the plurality ofsecond test compounds includes one or more test compounds from withinthe first test library (i.e., as a control).

[0027] The steps of measuring indicia of a property of a plurality ofculture media each containing a respective test compound, determining arelationship between the measured indicia and at least one parameter ofthe test compounds, and then determining a follow-up set of compoundsmay be carried out more than one time so as to identify a compound(s)that provides a desired indicia of the property when included as acomponent of culture media. Alternatively, if the first set of testcompounds provides a compound having a desired activity, the screeningprocess may end at that point without screening a second, or successive,set of test compounds.

[0028] The relationship determined between the parameter(s) of the firsttest compounds and the indicia of the measured property can bedetermined by any method for describing the interaction between theactivity and structure of compounds, for example, by quantitativestructure-activity relationships (QSAR), nearest neighbor analysis,self-organizing maps, or other machine learning and statisticaltechniques.

[0029] In one preferred embodiment, the relationship may be expressed inthe form of ŷ=f(x_(ij)), where x_(ij) denotes a parameter, i ranges from1 to n, where n represents the number of first culture media, j rangesfrom 1 to d, where d represents the number of parameters measured, andŷ_(i) represents an estimate of the measured first indicia of theproperty. The relationship represented by ŷ_(i)=f(x_(ij)) may be aparametric or non-parametric formula.

[0030] According to another preferred embodiment of the presentinvention, the relationship between the parameter(s) of the testcompounds and the indicia of the measured property is based on adistance function between the parameters of the tested compounds in thefirst test library and the parameters of untested compounds. Thedistance function can be expressed as d(x₁, x₂) between a first value ofa parameter, x₁, of a first test compound and a second value of the sameparameter, x₂, of a second untested compound. This relationship willassign to culture media containing a second untested compound anestimated indicia of the property that corresponds to the measuredindicia determined from a culture medium containing a first testedcompound from the first test library if d(x₁, x₂)≦d_(cutoff1), whered_(cutoff1) is a cutoff distance for the first test compound. In otherwords, once a lead compound is identified from the first test library,additional lead compounds can be determined based on an assumption thatcompounds that are close in parameter space will exhibit similaractivities. Accordingly, there is an increased probability thatcompounds close in parameter space will provide similar or betterindicia of the measured property. In particular embodiments of theinvention, x₁ and x₂ represent a single parameter or, alternatively, aset of parameters, i.e., x₁=x₁₁, x₁₂, x₁₃, x₁₄ . . . x_(1k) and x₂=x₂₁,x₂₂, x₂₃, x₂₄ . . . x_(2k), where k≧1. One specific example of a methodof determining a relationship based on distance in parameter space is“nearest neighbor” analysis. Other non-limiting and illustrative methodsare cluster analysis, self-organizing maps, and machine learningapproaches. See generally, B. B. Ripley Pattern Recognition and NeuralNetworks, Cambridge University Press, New York (1996).

[0031] In still another particular embodiment, more than onerelationship may be used to identify a plurality of second testcompounds from within a second test library that are predicted to giveindicia of the property that satisfy the test requirement when includedas a culture medium component (e.g., both a QSAR-type and nearestneighbor-type relationship may be employed). Furthermore, the methods ofthe present invention described hereinabove are preferably practiced inan iterative fashion, whereby the lead compounds identified in thesecond test library can be used to determine additional lead compoundsin a third test library, etc., until compounds that provide the desiredcharacteristics are identified. Moreover, the relationship determined ineach iteration need not be fixed. To illustrate, one type ofrelationship may be determined to identify a set of second testcompounds, but a different relationship may be determined in subsequentiterations.

[0032] In addition, the inventive methods can be used to identify a“cocktail” of compounds for formulating culture media—the methodsdescribed hereinabove are not limited to media containing only a singletest compound therein.

[0033] The second test library may be partially or completelyco-extensive with the first test library (i.e., encompass some or all ofthe same compounds) Alternatively, there may be no common compounds inthe first and second test libraries. The second test library willgenerally be smaller in size than the first test library, and may be asubset thereof.

[0034] The test compounds from the first and second test libraries maybe selected using any suitable method known in the art. Preferably, thetest compounds are selected from the compound space based on aspace-filling design. It is further preferred that the test compoundsselected from the compound space are representative of the entirecompound space. Exemplary space-filling designs include but are notlimited to full factorial designs, fractional factorial designs, maximumdiversity libraries, genetic algorithms, coverage designs, spreaddesigns, cluster based designs, Latin Hypercube Sampling, and otheroptimal designs (e.g., D-Optimal) and the like. The second test librarycan be selected from within the first test library or, alternatively,from within the compound space.

[0035] As a further alternative, the information obtained from screeninga first compound space can be used to start screening a second compoundspace. For example, if the first compound space only containedtetrapeptides, the information and/or lead peptides achieved byscreening the tetrapeptide space can be used to begin screening apentapeptide space. As still a further alternative, and as describedbelow, the first compound space may contain compounds of differentsizes. According to this embodiment, in the preceding example it may notbe necessary to go outside the first compound space if this spacecontained both tetrapeptides and pentapetides.

[0036] According to an additional preferred embodiment, the presentinvention provides a method of defining a test library from a largercompound space. Preferably, the test compound library will berepresentative of the compound space. In many instances, a compoundspace of interest may be so vast that it is computationally difficult todetermine a test library therefrom. In particular, the space may be solarge that it is not computationally feasible to evaluate the entirespace in forming a test library. According to the present invention, thecompound space can be reduced by grouping all compound isomers thereinas single candidate compounds based on at least one global parameter ordescriptor (e.g., compounds having the same molecular weight or chemicalformula). Thus, each set or group of compound isomers can be representedas a respective candidate compound. The contraction of the compoundspace can advantageously simplify the process of determining the firsttest library (or a follow up library), and IS based on the principlethat compound isomers may exhibit similar activities because of sharedwhole molecule parameters (e.g., molecular weight, lipophilicity,charge—as compared with sequence-specific parameters). A test librarycan be selected from this reduced compound space, e.g., by using aspace-filling design.

[0037] Optionally, and advantageously, after reducing the compoundspace, some of compound isomers within the reduced compound space areselected and expanded to re-introduce sequence-specific parameter(s)(e.g., z values, isotropic surface area, electronic charge index,hydrophobicity) of individual amino acids indexed to their relativepositions in the sequence. In this expansion step, preferably less thanall of the grouped candidate compounds in the reduced compound space areselected and are then re-expanded into their constituent compoundisomers. A test library can be selected from the re-expanded set ofconstituent compound isomers. In one particular embodiment, the testlibrary contains at least one representative compound from each expandedgroup of candidate compounds.

[0038] The test library is selected from the compound space using anysuitable method known in the art. Preferably, the test library isselected based on a space-filling design as described hereinabove.

[0039] Another embodiment of the invention relates to test compoundlibraries generated by the above-described methods. The test compoundlibrary is selected from a larger compound space by representing eachset of compound isomers in the compound space by a single compound. Inpreferred embodiments, the test library is formed from the reducedcompound space by selecting less than all of the grouped candidatecompounds and re-introducing sequence-specific parameters to re-generatethe constituent compound isomers. In a more preferred embodiment, thetest library is formed by selecting less than all of the constituentcompound isomers. More preferably still, the test library is formed byselecting at least one compound from each of the re-expanded groups ofconstituent compound isomers.

[0040] According to another particular embodiment, the inventive methodsof forming a test library by reducing (and optionally re-expanding) acompound space are employed in a method of forming a culture medium (asdescribed hereinabove). These methods can be used to determine the firstand/or the second test libraries as well as to determine subsequentfollow-up libraries.

[0041] As still a further aspect, the present invention provides amethod of predicting the activity of a peptide, preferably based on awhole molecule parameter(s) of the peptide. A relationship (e.g., amathematical relationship) is determined between a measured indicia ofan activity (e.g., a biological activity) of a plurality of peptidesfrom a test peptide library and at least one parameter (preferably atleast one whole molecule parameter) of the test peptides. Based on therelationship, the indicia of the activity of untested peptides can bepredicted based on the parameter(s), preferably at least one wholemolecule parameter, of the untested peptides. This method finds use,e.g., in rational drug design, in developing culture media, in methodsof identifying and/or designing peptides that act as receptor agonistsor antagonists, and in methods of identifying peptides that induce orenhance, or conversely prevent or inhibit, any activity of any targetprotein (e.g., a receptor or enzyme), cell, or nucleic acid (e.g., DNAand/or RNA).

[0042] In one particular embodiment, the method is used to identify apeptide with a-predicted indicia of an activity that satisfies a testrequirement. According to this embodiment, indicia of an activity of aplurality of test peptides from a first test peptide library aremeasured. A relationship is then determined between at least oneparameter, preferably at least one whole molecule parameter, and themeasured indicia of the activity of the test peptides. Those skilled inthe art will appreciate that the relationship may also includesequence-specific parameters in addition to a whole moleculeparameter(s). The relationship can be employed to determine a secondtest library containing a plurality of test peptides that are predictedto provide indicia of the activity that satisfies the test requirement.

[0043] In particular embodiments, the test peptide library is selectedfrom a larger peptide space. In preferred embodiments, the peptide spaceis collapsed based on whole molecule parameters, and optionallyre-expanded by re-introduction of sequence-specific parameters, prior toselecting the test peptide library therefrom, as described above.

[0044] Still another preferred embodiment of the present invention IS anapparatus for identifying a compound(s) for forming a culture medium.The preferred apparatus comprises means for determining a relationshipbetween the measured indicia of a property of a plurality of firstculture media, each of which contain a respective first test compoundfrom within a first test library and at least one parameter of the firsttest compounds. The preferred means further comprises means foridentifying a second test library containing a plurality of second testcompounds as components of a plurality of culture media, which based onthe relationship, are expected to provide second indicia of the propertythat satisfy a test requirement A computer program product is alsoprovided for controlling the operation of the determining andidentifying means and for performing numerical calculations to carry outthe above-described operations.

[0045] In particular, a preferred computer program product comprises acomputer-readable storage medium having computer-readable program codemeans embodied in the medium. The preferred computer-readable programcode means comprises means for determining a relationship betweenmeasured indicia of a property of a plurality of first culture mediaeach containing a respective test compound from within a first testlibrary and at least one physical parameter of the first test compounds.Computer-readable program means is also provided for identifying asecond test library containing a plurality of second test compounds ascomponents of a plurality of second culture media, which based on therelationship between the measured first indicia and the parameter(s) areexpected to provide indicia of the property which meets a testrequirement relating to the measured first indicia. In addition,computer-readable program code means is also provided for performingmore detailed ones of the above-described operations numerically. Thisembodiment of the present invention therefore provides a tool that canmore accurately perform library screening to identify compounds forforming an improved culture medium.

[0046] Yet another preferred embodiment of the present invention is anapparatus for defining a test compound library. This preferred apparatuscomprises means for representing each of a plurality of groups ofcompound isomers within a compound space as a candidate compound, sothat the compound space is reduced, e.g., to facilitate computationalmanipulation and sampling thereof Preferably, the apparatus furthercomprises means for defining a first test library by selecting andexpanding less than all of the candidate compounds into theirconstituent compound isomers It is further preferred that the apparatuscomprises means for selecting a test library from the expanded compoundisomers. A computer program product is also provided for controllingoperation of the representing and defining means and performingnumerical calculations to carry out the above-described operations.

[0047] A preferred computer program product comprises acomputer-readable storage medium having computer-readable program codemeans embodied in the medium. The preferred computer-readable programcode means comprises means for representing each of a plurality ofcompound isomers from within a first compound space as a respectivecandidate compound. Computer-readable program means is also provided fordefining a first test library by expanding less than all of thecandidate compounds into their constituent compound isomers. Inaddition, computer-readable program code means is also provided forperforming more detailed ones of the above-described operationsnumerically. This embodiment of the present invention is thereforeadvantageous in that it provides a tool for defining a test compoundlibrary from a larger compound space.

[0048] These computer program products may be realized in whole or inpart as software modules running on a computer system. Alternatively, adedicated stand-alone system with application specific integratedcircuits for performing the above-described operations may be provided.

[0049] These and other aspects of the invention are set forth in moredetail in the description of the invention hereinbelow.

BRIEF DESCRIPTION OF THE DRAWINGS

[0050]FIG. 1 is a flow chart illustrating operations performed bymethods, apparatus and computer program products according to a firstembodiment of the present invention.

[0051]FIG. 2 is a graph of measured indicia of a property of interestdetermined from a plurality of culture media each containing arespective test peptide from within a peptide library (peptides onx-axis).

[0052]FIG. 3 is a flow chart illustrating operations performed bymethods, apparatus and computer program products according to a secondembodiment of the present invention.

[0053]FIG. 4A is a graph of measured indicia of a property of interestdetermined from a plurality of culture media each containing arespective test peptide from within a peptide library (peptides onx-axis).

[0054]FIG. 4B is a graph of the space surrounding a lead peptide fromFIG. 4A with respect to two parameters: total dipole and hydrophobicity.The concentric circles indicate different cut-off points for thedistance relationship.

[0055]FIG. 5 illustrates a general hardware description of an apparatusfor identifying a culture medium component from a compound libraryaccording to the present invention.

[0056]FIG. 6 is a flow chart illustrating operations performed bymethods, apparatus and computer program products according to a thirdembodiment of the present invention.

[0057]FIG. 7 is a flow chart illustrating operations performed bymethods, apparatus and computer program products according to a fourthembodiment of the present invention.

[0058]FIG. 8 is a flow chart illustrating operations performed bymethods, apparatus and computer program products according to a fifthembodiment of the present invention.

DESCRIPTION OF THE PREFERRED EMBODIMENTS

[0059] The present invention will now be described more fullyhereinafter with reference to the accompanying drawings, in whichpreferred embodiments of the invention are shown. This invention may,however, be embodied in different forms and should not be construed aslimited to the embodiments set forth herein. Rather, these embodimentsare provided so that this disclosure will be thorough and complete, andwill fully convey the scope of the invention to those skilled in theart. Like numbers refer to like elements throughout.

[0060] Referring now to FIG. 1, preferred operations 100 for identifyinga component of a culture medium use the parameter(s) or descriptor(s) (eg, physical, chemical, biological and/or topological parameters) ofcompounds within a compound space to identify those compounds that arepredicted to have a particular property (e.g., biological activity) whenincluded as a component of a culture medium. The prediction is based ona relationship that is determined between measured indicia (e.g.,quantitative level) of the property observed with known compounds.Alternatively, the measured indicia can be a qualitative measure (e.g.,response/no response). The relationship is used to determine thoseuntested compounds that are predicted to give desired indicia of theproperty based on the parameter(s) of the compounds.

[0061] The present invention can be employed to identify any type ofpolymeric compound of interest, preferably for use as a component of aculture medium. Exemplary classes of compounds include, but are notlimited to peptides, proteins (including modified proteins, e.g.,glycoproteins), lipids, carbohydrates, nucleic acids, and the like.Peptides are the preferred compound.

[0062] Compound libraries can be made by any method known in the art.Individual compounds within the libraries can be isolated and/orsynthesized by any method known in the art. In particular, peptides canbe synthesized by solid phase solution phase synthetic methods. Forexample, the peptides can be synthesized by FMOC chemistry (Atherton etal., (1989) Solid Phase Synthesis: A Practical Approach. IRL Press atOxford University Press, Oxford, England) on an Advanced ChemTech Model396 synthesizer. Alternatively, peptides may be synthesized using othervariations of the Merrifield approach (Merrifield, (1965) J. Am. Chem.Soc. 85:2149); including Boc chemistry, synthesis on other solidsupports (e.g., other resins, pins, etc., “tea-bag” synthesis (R. A.Houghten, (1985) Proc. Nat. Acad. Sci. USA 82:5131), and bycombinatorial methods (e.g., split and divide). Peptides may also besynthesized to include modifications to the carboxyl terminus (e.g.,esters, amides, etc.), the amino terminus (e.g., acetyl groups), andother non-naturally occurring amino acids (e.g., norleucine). Methods ofoligonucleotide synthesis are also known in the art. See, e.g.,Oligonucleotide Synthesis A Practical Approach, M. J. Gait, ED, IRLPress Washington, D.C., 1984. The generation of carbohydrate librariesis described, e.g., in Liang et al., (1996) Science 274:1520Construction of RNA libraries are known in the art, e.g., by SELEX asdescribed by C. Tuerck et al., (1990) Science 249:505.

[0063] This embodiment of the invention can be used to identify acompound(s) (as described above) for use as a component(s) of culturemedium (e.g., cell culture medium, tissue culture medium, organ culturemedium, and the like) having any property(ies) of interest. Exemplaryproperties include, but are not limited to, altering (e.g., enhancing orincreasing, or in contrast, inhibiting or suppressing) the growth ofcells in culture (e.g., cell division and/or cell size), alteringproduction (e.g., at the level of transcription, translation,post-translational processing, intracellular transport, secretion,turnover, and the like) of a protein(s) and/or peptide(s) (e.g.,antigens, toxins, antibodies, hormones, growth factors, cytokines,clotting factors, enzymes, and the like) by cells in culture, andaltering the synthesis, processing and/or secretion of other compoundsand/or metabolites (e.g., antibiotics, steroids, carbohydrates, lipids,nucleic acids, and the like). Additional properties include the abilityto alter (as defined above) the maturation, differentiation, growthand/or proliferation of cells.

[0064] Those skilled in the art will appreciate that a compound may givedifferent indicia of the property depending on the particular cell,tissue or organ to be cultured. . Moreover, the indicia of the propertymay be affected by the base culture medium and/or the presence of othermedium components. The culture medium can be used to culture any cell,tissue, or organ of interest. Preferably the culture medium is a cellculture medium. Typically, and preferably, the present invention is usedto identify a component of culture medium to culture cells in vitro. Inparticular, the present invention can be practiced to culture animal(more preferably mammalian, avian or insect), plant, bacterial,protozoan, fungal, or yeast cultures. In addition, the culture mediumcan be one that is used to package viruses or bacteriophage in hostcells. The present invention can also be advantageously employed toidentify compounds for culture medium for cultures of primary cells(e.g., to grow β-islet cells for insulin production). Other uses of thepresent invention are to culture pathogenic organisms for diagnosticpurposes, culture cells that have been genetically engineered to expressrecombinant peptides or proteins (e.g., biopharmaceuticals such asinterferon, tissue plasminogen activator, antibodies; industrialenzymes, in particular, low yield industrial enzymes such as restrictionenzymes, taq polymerases, synthetases, and the like), and to culturecells to isolate secondary metabolites for use as drugs (e.g.,cephalosporins). Finally, the culture medium can be a liquid,semi-solid, or solid culture medium. Preferably, the culture medium is aliquid medium.

[0065] Indicia of the property may be measured using any suitable methodknown in the art. For example, ELISAs or any other immunoassays relyingon specific binding to an antibody or receptor may be employed.Typically, such methods will involve a radiolabeled, fluorescent orother detectable moiety (e.g., a dye or intercalator such as acridineorange for DNA). Measurements may also be determined using labels thatproduce signals detectable by spectrophotometry (including colorimetryand measurement of optical density), x-ray diffraction or absorption,magnetism, or enzymatic activity. Chemiluminescence and fluorescencelifetime measurements may also be utilized. Suitable labels includefluorophores, chromophores, radioactive isotopes, electron-densereagents, enzymes, and ligands having specific binding partners (e.g.,biotin-avidin). Alternatively, a flow-through assay such as those thatemploy surface plasmon resonance detection may be used.

[0066] Cell number and/or size can be readily assessed by methods knownin the art, e.g., staining and visual observation, turbiditymeasurements, spectrophotometric methods (including colorimetry andmeasurement of optical density), counting with an automated cell counterand/or automated plate counter, measurement of total cellular DNA and/orprotein, impedance of an electrical field, bioluminescence, carbondioxide, oxygen or ATP production or consumption, and the like. Proteinsand other compounds can be detected and/or quantified using standardanalytical techniques such as chromatography, gel separation techniques,and the like. Likewise, methods of detecting nucleic acids arewell-known in the art and include specific hybridization to probesequences and amplification methods (e.g., polymerase chain reaction,strand displacement amplification, etc.). Carbohydrates can be detectedby any method known in the art, including but not limited to,carbohydrate-specific staining (e.g., lectins or anthrone-based assays),spectrophotometric methods with dyes or copper, A₂₀₅ measurements, orgas-liquid chromatography.

[0067] Any measurement tool known in the art may be used to takemeasurements as described above, e.g., a spectrophotometer forabsorption or calorimetric measurements, a fluorometer or flow cytometerfor fluorescence measurements, a scintillation or gamma counter forradioactive measurements, and an automated cell counter, automated platecounter, or manual plate counter for cell number measurements. As afurther example, a microwell reader can be used for fluorescence,absorbance or calorimetric measurements.

[0068] Referring again to FIG. 1, in carrying out this particularembodiment of the invention, a first plurality of culture media eachcontaining a first test compound(s) is provided. Operations areperformed to determine indicia of the property of interest for each ofthe culture media in the first plurality thereof, Block 102.

[0069] The first test compounds are selected from a first test libraryof compounds, preferably using a space-filling design. It is alsopreferable that the first test compounds be representative of the firsttest library. The term “space-filling design” as used herein is intendedto be construed broadly and includes all such techniques known to thoseskilled in the art. Exemplary space-filling designs include but are notlimited to full factorial designs, fractional factorial designs, maximumdiversity libraries, genetic algorithms, coverage designs, spreaddesigns, cluster based designs, Latin Hypercube Sampling, and otheroptimal designs (e.g., D-Optimal), and the like.

[0070] A space-filling design assists in selecting experimental designpoints. Ideally, all data would be gathered. at every possiblecombination of the explanatory variables which may possibly affect theresponse of interest, in other words, fill the entire space. Whenthe-candidate space is very large and the number of possible values islarge, it may not be feasible to enumerate all such possiblecombinations, much less physically gather the data. For example, itwould generally not be feasible to evaluate all possible peptidetetramers or pentamers (i.e., 160,000 possible tetramers and 3,200,000possible pentamers). Space-filling designs provide a strategy forgathering data at a set of design points, such that the data gatheredwill efficiently represent all candidate compounds, known as thecandidate space. When no prior information or knowledge is available,one method of generating a space-filling design is to use a geometricdistance-based criterion.

[0071] Two general categories of distance-based designs are minimax andmaximin. Assume that C denotes a finite set of possible design pointsand that there is a distance function d on C×C such that (C, d) is ametric space. Consider subsets D of C of size n D is called adistance-based design if the design criteria depends on the distancefunction d. The minimax criterion attempts to cover the experimentalspace by locating design points so as to minimize the maximum distancefrom any candidate point to the closest design point. More specifically,call D* a minimax distance design if${\min\limits_{D}{\max\limits_{v \in C}{d\left( {y,D} \right)}}} = {{\max\limits_{v \in C}{d\left( {y,D^{*}} \right)}} = {{d^{*}\quad {where}\text{:}\quad {d\left( {y,D} \right)}} = {\min\limits_{x \in D}{{d\left( {y,x} \right)}.}}}}$

[0072] The maximin criterion tries to spread the design points in spaceso as to maximize the minimum distance between the pairs of designpoints. In particular, we call D° a maximin distance design if${\min\limits_{D}{\max\limits_{x,{x^{\prime} \in D}}{d\left( {x,x^{\prime}} \right)}}} = {{\min\limits_{x,{x^{\prime} \in D^{o}}}{d\left( {x,x^{\prime}} \right)}} = d^{o}}$

[0073] Maximin designs can be generated by Gosset (Hardin and Sloane,(1992) Operating Manual for Gosset: A General Purpose Program forConstructing Experimental Designs (2d ed.), Mathematical ScienceResearch Center, AT&T Bell Laboratories, Murray Hill, N.J.).

[0074] Approximations to these criterion that are more numericallystable and can be found using an exchange algorithm, are the “coverage”and “spread” criteria, respectively. The maximin, or spread criterion,tends to produce designs with a large number of design points at theboundaries of the region or most extreme values, while the minimax, orcoverage criterion, produces designs with more points in the interior ofthe region.

[0075] A coverage design minimizes the following criterion for a choiceof parameters p and q:${c_{pq}\left( {C,D} \right)} = \left( {\sum\limits_{x \in C}{\left( {d_{p}\left( {y,D} \right)} \right)^{q}/N_{C}}} \right)^{1/q}$

[0076] where the distance metric is defined as${d_{p}\left( {y,D} \right)} = \left( {\sum\limits_{x \in D}{{{x - y}}^{p}/n}} \right)^{1/p}$

[0077] where p<0 and q>0

[0078] Alternatively, as another space-filling design, test librariescan be generated using a genetic algorithm. In general, a geneticalgorithm is based on the model of natural selection Genetic algorithmsoptimize structures by computationally performing selection, crossover,and mutation in a population of structures in a manner analogous tonatural selection. A given population of compounds is encoded as binarystructures (“chromosomes”), and their opportunity to “reproduce” and beincluded in succeeding generations is based on their biologicalactivities. In the reproduction step, the chromosomes for two compoundsare crossed at a single point to produce two new “children” compounds.Mutation occurs by randomly changing any single bit in the sequence. Thechromosomes are then decoded into compound structures, which are thensynthesized and tested, and the process is repeated for the nextgeneration.

[0079] A typical genetic algorithm runs as follows:

[0080] Step 1. Initialize a population of chromosomes, i.e., compounds(this can be done completely at random by a computer, or selectedstructures can be used to “seed” the initial population).

[0081] Step 2. Evaluate each chromosome in the initial population (e.g.,synthesize and test every peptide in the initial population).

[0082] Step 3. Create new chromosomes by mating current chromosomes,apply mutation and recombination as the parent chromosomes mate (this isdone by feeding the indicia of the properties of the compounds into thecomputer, and the program performs the mutation and recombinationprocess).

[0083] Step 4. Delete members of the population to make room for the newchromosomes (the population will always be fixed at a particular sizeThe program will select which compounds get deleted, which are usuallythe poorest-performing compounds).

[0084] Step 5. Evaluate the new chromosomes and insert them into thepopulation.

[0085] Step 6. The process can end at this point, with the bestchromosome(s) being selected, alternatively, additional generations canbe followed by repeating steps 3-5

[0086] To illustrate, the following example using model data ispresented In selecting a peptide library using a genetic algorithm, thechromosomes will be individual peptides. Each amino acid may berepresented as a binary string. For a 4-bit string, there are 16possible combinations (Table 1). If, for example, only 10 of thepossible amino acids are used, 6 of these amino acids must berepresented twice (e.g., Gly is represented by 1010 and 1011), so thatall of the 16 possible combinations are assigned to an amino acid asfollows: TABLE 1 Binary String Amino Acid 0000 Val 0001 Glu 0010 Leu0011 Pro 0100 Lys 0101 Ser 0110 Ala 0111 Val 1000 Phe 1001 Glu 1010 Gly1011 Gly 1100 Ser 1101 Phe 1110 Gln 1111 Pro

[0087] An initial population of tetrapeptides can be generated using arandom number generator. Structures can be modified at this pointbecause of possible synthetic difficulties or to ensure that each aminoacid is represented at each position, etc. Assume the following set ofchromosomes (peptides) are generated: TABLE 2 SEQ ID NO: Peptides BinaryString 1 Gly-Ala-Leu-Gly 1010011000101010 2 Gln-Gly-Val-Glu1110101000000001 3 Ser-Ala-Pro-Val 0101011000110000 4 Ser-Pro-Ala-Gln0101001101101110 5 Glu-Glu-Val-Phe 0001000100001000 6 Val-Leu-Ser-Lys0000001001010100 7 Val-Ser-Glu-Leu 0000010100010010 8 Pro-Phe-Glu-Pro0011100000010011 9 Glu-Leu-Gln-Glu 0001001011100001 10 Lys-Val-Gln-Phe0100000011101000 11 Gly-Lys-Ala-Pro 1010010001100011 12 Ala-Gln-Lys-Ser0110111001000101 13 Ala-Gln-Gly-Glu 0110111010100001 14 Lys-Glu-Phe-Gly0100000110001010 15 Pro-Ser-Phe-Lys 0011010110000100 16 Phe-Ser-Leu-Ala1000010100100110 17 Leu-Phe-Gly-Ala 0010100010100110 18 Glu-Val-Lys-Ser0001000001000101 19 Val-Gly-Glu-Ala 0000101000010110 20 Gln-Glu-Ser-Gln1110000101011110

[0088] If, for example, the computer decided to cross Gly-Ala-Leu-Gly(SEQ ID NO:1) and Ser-Ala-Pro-Val (SEQ ID NO:3) at their mid-points, itwould generate two new children chromosomes/peptides to the populationto test: Gly-Ala-Pro- and Val (SEQ ID NO21) and Ser-Ala-Leu-Gly (SEQ IDNO:22).

[0089] Genetic algorithms are described in more detail, e.g., in J. Singet al., (1996) J. Am. Chem. Soc. 118:1669 and Handbook of GeneticAlgorithms, , Van Nostrand Reinhold: New York, 1991.

[0090] Referring back to FIG. 1, the test compounds can be all or asubset of the compounds in the first test library The first test librarycan be selected on the basis of any criterion known in the art. Forexample, the first test library may include all possible pentapeptides(naturally occurring and/or non-naturally occurring). Alternatively, thefirst library may contain all possible pentapeptides based on a set often possible amino acids. As a further non-limiting example, all of thecompounds in the first test library may have a specific subunitdesignated in a particular position(s) (e.g., the first amino acid mayalways be an alanine or an aromatic amino acid). It is not necessarythat all of the test compounds in the first test library actually besynthesized and/or isolated, e.g., the library may be a “virtual”library. Typically, it is only necessary that the test compounds forwhich an indicia of the property is determined actually be synthesized,alternatively isolated, and evaluated. Alternatively, it is possiblethat indicia of the property of a particular compound(s) may bedetermined from other sources (e.g., the literature or from previousstudies or studies from other investigators) and, therefore, thiscompound(s) would not have to be synthesized and a measurement of theindicia of the property of interest determined.

[0091] It is preferred that the present invention be carried out toscreen a peptide library to identify peptides for use as components ofculture medium. There are no particular requirements of the peptidelibrary to be used to carry out this embodiment of the invention. Thepeptides in the library may contain naturally-occurring and/or syntheticamino acids. The library may also contain modified amino acids (e.g.,phosphorylated, methylated, glycosylated, and the like). Moreover, thepeptide library can be defined to contain less than all of the possiblenaturally-occurring and/or synthetic amino acids. The peptide librarymay also be defined so that all of the peptides therein have the samelength or range of lengths. Alternatively, the peptides in the librarymay vary in length, e.g., tetramers, pentamers and/or hexamers. Peptidelibraries in which all of the peptides have the same length (e.g., 4, 5,6, 7, 8, 9, 10 or more amino acids) are preferred. In preferredembodiments, the peptide library contains peptides having a length offour, five, six, seven, eight, nine, or ten amino acids, or longer.

[0092] In other preferred embodiments, the peptide library containspeptides having a length in a range from about four amino acids to abouttwenty amino acids, more preferably, from about four amino acids toabout ten amino acids, In alternate preferred embodiments, one or moreamino acid positions In the peptides is fixed (i.e., nonvariable) orlimited to specified particular amino acid(s) or class(es) of aminoacids. For example, in a library of pentapeptides, the amino acids atposition 4 and 5 might be fixed as a specific amino acid (e.g., Ala orVal) or class of amino acids (e.g., aromatics). Likewise, the peptidesmay be 20-mers, but only 5 of the positions may be variable with theother positions being fixed. The positions may be fixed based on anycriteria, e.g., random assignment, prior chemical knowledge, ease ofmanufacturing and/or synthesis, cost, and the like.

[0093] Those skilled in the art will appreciate that fixing or limitingthe possible amino acid(s) at a particular position or positions willreduce the total number of possible peptides and may likewise decreasethe time, expense and/or technical difficulties associated withsynthesizing and testing peptides, identifying leads, and follow up (ifnecessary).

[0094] As a further aspect, the present invention provides a method ofscreening a compound library (preferably a peptide library) in which atleast one of the amino acid positions is nonvariable or limited todesignated subunits (i.e , less than all possible subunits, e.g., aminoacids). A compound of interest, e.g., as a component of a culturemedium, may be identified in the first round of screening Alternatively,leads are identified and successive screenings are performed asdescribed herein.

[0095] Operations are performed to determine a desired indicia, oralternatively range of indicia, of the property to establish a testrequirement against which the measured indicia of the property of themedia containing the test compounds are compared, Block 106. The testrequirement may be determined at any stage in the process of identifyinga culture medium component. For example, the test requirement may be seta priori or, alternatively, it may be determined after the indicia ofthe property of the first plurality of culture media each containing afirst test compound is determined. Moreover, the test requirement maychange during the compound screening process.

[0096] The test requirement may represent a threshold level and indiciaof the property failing at or above the test requirement may bedesirable (e.g., when screening for compounds that increase antibioticproduction) or it may represent a ceiling and values falling below thetest requirement may be desirable (e.g., when screening for compoundsthat suppress endotoxin production during fermentation processes). As afurther alternative, the test requirement may relate to a range ofdesired indicia, i.e., the test requirement may establish both a floorand a ceiling for the measured indicia (e.g., to reach a balance betweencompeting factors, such as cell growth and protein production). Thoseskilled in the art will appreciate that the test requirement mayrepresent the optimal indicia of the property (e.g., maximal immunogenproduction), alternatively, the test requirement may take into accountother criteria such as feasibility, cost, time constraints, effects onother desired properties of the culture medium, etc.

[0097] As yet a further alternative, the test requirement may bequalitative, rather than quantitative, in nature, e.g., if one islooking for the absence/presence of a particular response (i.e., ayes/no answer). Those skilled in the art will recognize that forcomputational analysis of qualitative data, the qualitative values willmost likely be converted into quantitative values (e.g., response/noresponse→1/0)

[0098] Operations are performed to determine a relationship between atleast one parameter of the first test compounds and the measured indiciaof the property for each of the first test compounds, Block 104Preferably, the relationship is a mathematical relationship, morepreferably a mathematical structure-activity relationship between theparameter(s) and the property (i.e., activity) of interest. There is noparticular limit to the number of parameters used to determine therelationship, and two, three, four, five, six, seven, eight, nine, tenor more parameters can be used. In one illustrative example, arelationship is determined between three parameters (molecular weight,hydrophobicity, total charge) of a plurality of peptides and themeasured cell growth or P-toxin production by cultures of Clostridiumperfringens.

[0099] Any parameter (i.e., descriptor) known in the art that can beapplied to characterize a compound may be used to carry out the presentinvention. Physical, chemical (including biochemical), biological and/ortopological parameters may be employed to determine the relationship.The term “parameter” as used herein is also intended to encompass theprinciple components of S. Hellberg et al., (1987) J. Med. Chem. 30:1126(e.g., z1, z2, z3). The parameter(s) used to describe the test compoundscan change in both number and type during the selection process. Inaddition, the parameter(s) can be a whole molecule parameter(s),sequence specific parameter(s), or a combination of both.

[0100] Preferably, the compounds are characterized using at least onewhole molecule parameter (e.g., one, two, three, four, five, six, seven,eight, nine, ten or more whole molecule parameters). Also preferred areembodiments wherein the compounds are characterized using only wholemolecule parameters. A “whole Molecule parameter” is a value thatcharacterizes a molecule irrespective of the arrangement of itsconstitutive atoms. For example, a whole molecule parameter for apeptide is one that does not depend on the order or sequence of theamino acids in the peptide. Describing a molecule using at least onewhole molecule parameter may facilitate the compound screening processbecause it reduces (i.e., collapses) the size of the compound space andcan thereby decrease the time, computational difficulty, and cost ofscreening large compound spaces (as described in more detail below).

[0101] Conversely, a “sequence-specific” parameter is one that isdependent on the specific order or sequence of the constitutive atoms orsubunits. Examples of particular sequence-specific and whole moleculeparameters have been provided hereinabove.

[0102] Illustrative parameters that may be employed according to thepresent invention include but are not limited to molecular weight,charge, isoelectric point, total dipole moment, isotropic surface area,electronic charge index, and hydrophobicity (e.g., as exemplified bymeasurements such as logP, HPLC retention times, or other methods ofdetermining hydrophobicity known in the art) of the whole molecule orindividual building block in the molecule (e.g., peptide, amino acid,nucleic acid, sugar unit, etc.). Any suitable topological parameterknown in the art may be employed, such as those described by L. B. Kierand L. H. Hall, Molecular Connectivity in Structure-activity Analysis,Research Studies Press, John Wiley & Sons, Letchworth England (1986); M.Johnson et al., Concepts and Applications of Molecular Similarity, JohnWiley & Sons, New York (1990); and R. P. Sheridan et al., (1995) J.Chem. Inf. Comput. Sci. 35:310.

[0103] Calculations of parameters can be carried out by any method knownin the art, for example, using a computerized system, e.g., a SiliconGraphics computer or a PC. Total charge, molecular weight, and totaldipole can be calculated using Sybyl 6.5 (Tripos). Moriguchi logP (i.e.,mlogP, a measure of hydrophobicity) can be calculated using a SybylProgramming Language Script. Literature values of electronic chargeindex and isotropic surface area for amino acids are available, see,e.g., E. R. Collantes et al., (1995) J. Med. Chem. 38:2705. A variationof electronic charge index can be prepared in an analogous manner usingGasteiger charges supplied by Sybyl instead of CNDO/2 charges used byCollantes et al. (Id). Principal component descriptors z1, z2, and z3are provided by Hellberg et al., (1987) J. Med Chem. 30:1126.Calculations of the isoelectric point can be carried out using a SybylProgramming Language Script.

[0104] The relationship between the at least one parameter of the testcompounds and the measured indicia of the property for each of the testcompounds is used to identify a second plurality of culture media. Eachof the second culture media contains a second compound(s) from within asecond test library, where the second plurality of culture media arepredicted to give indicia of the property that satisfy the testrequirement, Block 108. Typically, the second test compounds will beuntested compounds although those skilled in the art will appreciatethat one or more compounds from the first set of test compounds may beincluded in the second set of test compounds, e g., as controls.

[0105] In particular embodiments, the second test library includes allcompounds that are predicted to satisfy the test requirement.Alternatively, and preferably, the second test library is chosen toinclude a subset of the total number of compounds that satisfy the testrequirement. The second set of test compounds may include all of thetest compounds in the second test library or, alternatively, a subsetthereof. For example, the second test library may include all peptideshaving five amino acids that are predicted to result in antibodyproduction from cultured hybridoma cells above a particular level (i e.,the test requirement) when added to culture medium. Alternatively, andpreferably, the second test compounds are selected from, and morepreferably are representative of, the second test library. Yet morepreferably, the second test compounds are selected from the second testlibrary using a space-filling design, as described above.

[0106] In particular embodiments of the invention, indicia of theproperty of the second plurality of culture media are measured, and theindicia compared with the test requirement. A lead compound may beidentified at this stage that satisfies the test requirement.Alternatively, the above-described process of FIG. 1 is carried out inan iterative fashion. A second relationship between at least oneparameter of the second test compounds and the measured indicia will bedetermined, and a third set of test compounds from a third compoundlibrary is identified. As a further alternative, if the first testlibrary provides a suitable compound, the screening process can endthere without the need to generate a second test library or to engage infurther compound screening.

[0107] Those skilled in the art will appreciate that the systematicmethods described hereinabove can be supplemented by knowledge of thechemical behavior to select a follow-up library of compounds. Forexample, in screening a peptide library, it may become apparent thatpeptides containing amino acids with large aromatic groups exhibitdesired indicia of the property. Accordingly, a follow-up library may bechosen that is enriched in such peptides. Alternatively, if a desiredend product was composed of an abundance of one or more types of aminoacids, peptides containing these amino acids might be selected forscreening, in particular if it is known that the particular cell linecannot synthesize any or sufficient quantities of the amino acid(s). Asa further alternative, one may choose to make the carboxyl-terminalgroups of a peptide as amides or acids based on prior knowledge, e.g.,these features are known to enhance activity. Similarly, one mightsynthesize one library of peptides that all have carboxyl-terminal acidsand a second library of peptides with carboxyl-terminal amides If onelibrary performs better than the other, one might only use peptides withcarboxyl-terminal acids or amides for the remainder of the screeningiterations

[0108] In one preferred embodiment of the invention, as diagrammed inFIG. 1, the relationship may be expressed in the form of ŷ=f(x_(ij)),where x_(ij) denotes a parameter, i ranges from 1 to n, where nrepresents the number of first culture media, j ranges from 0.1 to d,where d represents the number of parameters measured, and ŷ_(i)represents an estimate of the measured first indicia of the property,Block 104. The relationship represented by ŷ_(i)=f(x_(ij)) may be aparametric or non-parametric formula.

[0109] In one particular embodiment, the relationship is a quantitativestructure-activity relationship (QSAR). This aspect of the invention canbe demonstrated using an illustrative example, as follows.

[0110] The present invention may be used to identify a peptide toinclude as a component of a culture medium. The culture medium may beused to culture bacterial cells genetically engineered to produce aheterologous protein of interest. Accordingly, it would be desirable toidentify a peptide which when included in a culture medium will enhanceprotein production by bacterial cells grown in the culture medium (i.e.,to satisfy a test requirement).

[0111] The following discussion is provided to illustrate the algorithmusing exemplary data. In this example, eight test peptides are selectedfrom a tetrapeptide library: DKAH, DWPA, ESMH, GVNE, HEDV, ETGS, HYGV,and DFGV (SEQ ID NO:23 to SEQ ID NO:30; Table 3). The test peptides maybe selected from the library by any means known in the art. The valuesfor three parameters (molecular weight, total charge, and mlogp, i.e.,hydrophobicity) may be, determined for each of the eight peptides. Theindicia of the property, in this example a particular biologicalactivity (i.e., protein production), may be determined for the eightpeptides as well. The exemplary data are shown in Table 3. TABLE 3 SEQID Hydro- Total NO: Peptide phobicity Mol. Wt Charge Biol. Act. 23 DKAH−3 479 469.499 0 15.0 24 DWPA −1 608 486 505 −1 25 0 25 ESMH −3 479501 535 −1 19 3 26 GVNE −3 421 416.411 −1 14 4 27 HEDV −4.03  496.477 −218.5 28 ETGS −4.25  391.357 −1 10.2 29 HYGV −1.278 474.518 0 23.6 30DFGV −1.616 435.457 −1 22.0

[0112] Using regression analysis, e.g., with the program S-Plus (Version3.4 for Solaris, Mathsoft, Seattle, Wash.), the following equation canbe derived to describe the relationship between the three parameters andthe (hypothetical) indicia of the property (i.e., biological activities)of the first set of test compounds

ŷ=3.64*logP+0 056*MW−1 97*charge+1 73 R ²=999  (1)

[0113] where ŷ is an estimated indicia of the property, logP is ameasure of hydrophobicity, and MW is molecular weight.

[0114] If a satisfactory peptide (i.e., satisfies the test requirement)is not identified among the first set of test peptides, the screeningprocess will continue. A second set of untested peptides can then beselected by any means known in the art, and the parameters for thesecond set of peptides may be calculated. Using Equation 1, thepredicted activity of a second set of culture media, where each of theculture media in the set contains one of the second test peptides, canbe calculated for each culture media in the second set based on theparameters of the peptide included therein. For example, a predictedactivity of 28.2 was derived for a culture medium containing theuntested peptide HYPV (SEQ ID NO:31; Table 4). This value is higher thanany of the biological activities in the original library, and, thus,this peptide would be a good candidate for synthesis and testing. TABLE4 Hydro- Total Predicted Predicted Peptide phobicity Mol. Wt ChargeBiol. Act. HYPV −0.645 514.583 0 28.2 (SEQ ID NO:31)

[0115] If the test requirement is for protein production at a level ofat least 25, then the compound screening process may end with theidentification of HYPV (SEQ ID NO:31) (assuming the actual biologicalactivity is equal to the predicted activity). Alternatively, if the testrequirement is set for protein production of at least 30, then thescreening process would continue. The actual indicia of the property ofa second set of culture media, each containing one of the second testpeptides, may be determined. From these measurements, a new relationshipbetween at least one parameter and biological activity is calculated.From this updated equation, a third set of peptides, which when includedin culture media are predicted to promote protein production by thebacteria at a level of 30 or greater are identified. Typically, thisprocess can continue in an iterative fashion until a peptide having thedesired biological activity is identified.

[0116] Likewise, if the test requirement was set at a level of at least20, then three of the original test peptides would satisfy the testrequirement (e.g., DWPA, HYGV, and DFGV, SEQ ID NO:24, SEQ ID NO.29, andSEQ ID NO:30, respectively; Table 3), and the compound screening processcould stop at this point or could continue to look for even betterperforming peptides.

[0117] Referring to FIG. 2, preferred operations for determining therelationship between the measured indicia of the property of theplurality of first culture media each containing a respective testcompound and the parameter(s) of the test compounds can be illustratedby a graphical representation. The calculated values of the dparameter(s) are plotted, and the measured values of the indicia of theproperty for the n culture media are plotted against the parametervalues in d+1 dimensional space. For ease of illustration only, in FIG.2, n=10 culture media and d=1 parameter.

[0118] Conventional line-fitting algorithms can be used to generate a“best fit” line for the plotted data. For example, regression analysiscan be utilized to determine a mathematical relationship between theindicia of the property and the value of the parameter for the testcompound in each culture medium. The relationship can be represented asŷ_(i)=f(x_(ij)), where x_(ij) denotes a parameter, i ranges from 1 to nwhere n represents the number of first culture media, j ranges from 1 tod where d represents the number of parameters, and ŷ_(i) represents anestimate of the measured first indicia of the property of the firstculture media.

[0119] The relationship can be used to identify a second plurality ofculture media each containing a respective second test compound which ispredicted to provide indicia of the measured property that satisfies thetest requirement. In FIG. 2, the test requirement has been establishedto select for compounds that provide indicia of greater than 20 units.The equation ŷ_(i)=f(x_(ij)) can be employed to identify those compoundsas components of culture medium that will provide indicia of theproperty that lie on the upper right portion of the line of FIG. 2(i.e., provide indicia of the property of greater than 20 units).

[0120] Alternatively, a distance function can be calculated to identifycompounds, which when added to culture media, are predicted to provideindicia of a property of interest that satisfies a test requirement. Ingeneral, the compounds are identified based on their proximity inparameter space to known lead compounds According to a preferredembodiment, shown in FIG. 3, the present invention is used to identifyculture medium components based on the parameters of the culture mediumcomponents, Block 300. Operations are performed to measure first indiciaof a property of interest for a first plurality of culture media whicheach contains a first test compound chosen from within a first testlibrary based on a space-filling design, and a test requirement relatingto the measured first indicia is determined, Blocks 302 and 304.Operations to carry out Blocks 300, 302 and 304 are as described abovefor Blocks 100, 102 and 106, respectively.

[0121] The distance function can be expressed as d(x₁, x₂) between afirst value of a parameter, x₁, of a first test compound and a secondvalue of the same parameter, x₂, of a second untested compound, Block306. This relationship will assign to culture media containing a seconduntested compound an estimated indicia of the property that correspondsto the measured indicia determined from a culture medium containing afirst tested compound from the first test library if d(x₁,x₂)≦d_(cutoff1), where d_(cutoff1) is a cutoff distance for the firsttest compound, Block 308. In other words, once a lead compound isidentified from the first test library, additional lead compounds can bedetermined based on an assumption that compounds that are close inparameter space will exhibit similar or better activities, Block 310.

[0122] In particular embodiments of the invention, x₁ and x₂ represent asingle parameter or, alternatively, a set of parameters, i.e., x₁=x₁₁,x₁₂, x₁₃, x₁₄ . . . x_(1k) and x₂=x₂₁, x₂₂, x₂₃, x₂₄ . . . x_(2k), wherek≧1. One specific example of a method of determining a relationshipbased on distance in parameter space is “nearest neighbor” analysis.Other non-limiting and illustrative methods are cluster analysis,self-organizing maps, and machine learning approaches. See generally, B.B. Ripley Pattern Recognition and Neural Networks, Cambridge UniversityPress, New York (1996). Typically, in this type of analysis, theparameters are established a prion, rather than by determining whichparameters to evaluate based on a relationship, as described above.

[0123] One particular, and preferred, method of Identifying compoundsbased on a distance function is nearest neighbor analysis This methodcan be illustrated using the following simplified example with modeldata.

[0124] According to this illustrative example, the present invention canbe used to identify a peptide as a culture medium component, e.g., formammalian hybridoma cells producing and secreting antibodies. Inparticular, it is desirable to identify peptides, which when added toculture medium, will promote antibody production by hybridoma cells at alevel greater than a test requirement. Four test peptides (DKAH, DWPA,ESMH, GVNE; SEQ ID NO:23 to SEQ ID NO:26, i.e., a training set) may beselected from a peptide library as described above with respect to theQSAR example. Values to describe the various parameters of the peptides,for example, hydrophobicity (i.e., mlogp), molecular weight, and totalcharge may be calculated for each peptide (Table 5). Each peptide may beadded to hybridoma culture medium and antibody production (i.e.,biological activity) may be measured for the cells cultured in eachculture medium (values shown in Table 5). TABLE 5 SEQ ID Hydro- TotalNO: Peptide phobicity Mol. Wt Charge Biol. Act. 23 DKAH −3.479 469.499 0 15.0 24 DWPA −1.608 486.505 −1 25.0 25 ESMH −3.479 501.535 −1 19.3 26GVNE −3.421 416.411 −1 14.4

[0125] Assume that there is a second set of untested (i.e., candidate)peptides as shown in Table 6. TABLE 6 SEQ ID Hydro- Total NO: Peptidephobicity Mol. Wt Charge Biol. Act. 27 HEDV −4.03 496 477 −2 ? 28 ETGS−4.25 391.357 −1 ? 29 HYGV −1.278 474.518  0 ? 30 DFGV −1.616 435.457 −1?

[0126] The idea of the nearest neighbor rule is to find candidatepeptides with parameters that are similar to those from the peptide(s)with the “best” (in this case highest) observed biological activity orthe lead peptide(s). Before performing any calculations, typically allparameters will be standardized so that they will each have an equalcontribution to the nearest neighbor calculation. In this illustrativeexample, all parameters may be standardized so that they lie between thevalues of 0 and 1. This standardization ensures that all parameters willhave an equal contribution to the nearest neighbor calculation. Astandardized value may be computed in the following manner:

Standardized value=(Original value−Min. value)/(Max. value−Min.value)  (2)

[0127] For example the standardized value of molecular weight for thepeptide DKAH (SEQ ID NO:23) may be calculated as follows:

(469.499-391.357)/(501.535-391.357)=0.7092  (3)

[0128] The standardized parameter values for the eight peptides aredisplayed below in Table 7. TABLE 7 SEQ ID Molecular Total NO: PeptideHydrophobicity Weight Charge 23 DKAH 0.2594 0.7092 1 24 DWPA 0.8890.8636 0.5 25 ESMH 0.2594 1 0.5 26 GVNE 0.2789 0.2274 0.5 27 HEDV 0.0740.9541 0 28 ETGS 0 0 0.5 29 HYGV 1 0.7548 1 30 DFGV 0.8863 0.4003 0.5

[0129] Once the standardized values have been calculated, nearestneighbors determined by calculating the Euclidean distances between thepeptides -dimensional space (where 3 represents the number ofparameters). For , the distance between DKAH (SEQ ID NO:23) and HYGV(SEQ ID is calculated as:

SQRT((0.2594−1)²+(0.7092−0.7548)²+(1−1) ²)=0.7420  (4)

[0130] Table 8 contains these calculated distances between the trainingset of peptides and the candidate set of peptides. TABLE 8 HEDV ETGSHYGV DFGV SEQ ID NO: 23 DKAH 1.0461  .9057 .7420 .8593 SEQ ID NO: 24DWPA .9604 1 2394 .5236 .4633 SEQ ID NO: 25 ESMH .5362 1.0331 .9266.8675 SEQ ID NO: 26 GVNE .9056  .3599 1.0238 .6315

[0131] The peptides in the candidate set will then be assigned predictedindicia of the property based the closest peptide in the training set(Table 9). The (hypothetical) biological activities for these fourpeptides may then be measured as shown in Table 9. TABLE 9 ClosestPredicted Observed Candidate Peptide Peptide Biol. Activity ActivityHEDV ESMH 19.3 18.5 (SEQ ID NO:27) (SEQ ID NO:25) ETGS GVNE 14.4 10.2(SEQ ID NO:28) (SEQ ID NO:26) HYGV DWPA 25.0 23.6 (SEQ ID NO:29) (SEQ IDNO:24) DFGV DWPA 25.0 22.0 (SEQ ID NO:30) (SEQ ID NO:24)

[0132] The test rule is to test candidate peptides that are similar tothe best members from the first test library. Thus, in this example,HYGV (SEQ ID NO:29) and DFGV (SEQ ID NO:30) may be synthesized andtested. If either or both of the peptides satisfy the test requirement,the compound screening process may be stopped at this point.Alternatively, if a compound has not yet been identified, or ifadditional compounds are desired, the process can be continued in aniterative fashion. As a further alternative, the selection and screeningprocess can be continued using a different relationship, e.g., a QSARrelationship as described above. Finally, as described above, if thefirst screening yields a suitable compound, it may not be necessary toengage in successive rounds of picking a library and screeningadditional test compounds.

[0133] Referring to FIGS. 4A and 4B, the process of identifying apeptide as a component of culture medium using nearest neighbor analysisis graphically represented. After the actual indicia of the propertyhave been measured, the indicia (y-axis) for each peptide (x-axis) maybe plotted in ascending (or conversely, in descending) order, FIG. 4A.Those compounds that satisfy the test requirement (in this case,compounds that provide indicia of the property of greater than 10 unitswhen added to culture medium) are selected as lead compounds and theparameter space surrounding some or all of these leads is exploredfurther.

[0134]FIG. 4B demonstrates nearest neighbor analysis of a particularlead peptide. For illustrative purposes only, two parameters (e.g.,total dipole and hydrophobicity) are employed for the analysis. Thestandardized values (as described above) for the two parameters areplotted on the x- and y-axis. Concentric circles can be drawn throughthe parameter space to represent a particular cut-off in Euclideandistance from the lead peptide. In one particular embodiment, aspace-filling design is used to find points in parameter space asindicated by the x's and test the peptides (circles) closest to thesepoints. The reason for extending the space around the lead peptide(i.e., concentric circles) is to gather information as to how closepeptides must be in parameter space to exhibit similar activities,characteristics, or indicia of the property(ies) of interest.

[0135] The cut-off distance will generally be established for each leadcompound. Typically, if the data measured on the first plurality ofculture media are clustered together, the cut-off distance will berelatively smaller than if the data points are spread out. Once thecut-off distance has been determined, then a second library of secondtest compounds that fall within the cut-off space can be identified.The. second test compounds are predicted to provide indicia of theproperty that are similar to or better than the closest lead compound.All or a subset of the second test compounds in the second test libraryare added to a second plurality of culture media and the actual indiciaof the property are measured. For example, a space-filling design can beused to select less than all of the second test library for screening.From this second data set, a final compound for identifying a culturemedium component may be identified. Alternatively, a second set of leadcompounds can be determined, and nearest neighbor analysis (or someother relationship, e.g., as described by FIG. 1) can be used toidentify a third set of compounds for screening. The screening processcan continue as many times as necessary to identify compounds,exhibiting acceptable or suitable indicia of the property(ies).

[0136] It will be appreciated by those skilled in the art thatadditional operations can be performed to further optimize theabove-described methods for identifying a culture medium component. Forexample, at any stage in the screening process, the number or types ofparameters can be changed. In particular, in preferred embodiments,redundant parameters will be identified and eliminated. It isadvantageous to identify those parameters that enhance thediscrimination among compounds and eliminate redundant parameters tominimize computation complexity and time and to reduce computer storagespace. Redundancy of parameters can be determined by any method known inthe art, e.g., Principal Component Analysis. Typically, redundantparameters will be highly correlated with an already-existing parameter.

[0137] As a further example of process optimization, a cocktail ofcompounds for use in formulating culture medium can be identified. Acocktail of compounds may give improved results over a single compoundalone (e.g., synergy). In addition, once a compound(s) for addition tothe culture medium has been identified, the base formulation may bereformulated to further improve the final medium.

[0138] Alternatively, reformulation of the base medium may occur at anypoint in the screening process. Those skilled in the art will appreciatethat in some situations it will be advantageous to formulate the basemedium so that relatively modest changes in the property of interest canbe detected as the effects of lead compounds identified at the earlystages of the screening process (which may be relatively small) aremasked by some base medium formulations. In other words, a less thanoptimum base medium may intentionally be selected, at least at theinitial stages of the identification process, to maximize the observedimpact of the test compounds on cell performance.

[0139] A common reason for reformulating media is to move to a definedor at last a semi-defined media, which will typically reduce thevariance in performance observed from one lot to the next. The moredefined recipes are generally more complex and thereby are laborintensive to prepare. In preferred embodiments of the invention, theaddition of the compound(s) identified by the inventive methods willpermit the omission or, alternatively, the reduction of one or moremedium components. The omitted or reduced component may be an undefinedcomponent, e.g., serum or a protein hydrolysate. Alternatively, one ormore defined media components may be reduced or omitted, e.g., an aminoacid, vitamin, mineral, carbohydrate source, lipid source, and the like.It is also desirable to remove or reduce the presence of certain mediumcomponents to minimize production costs and to simplify productionprocesses and quality control. Typically, this process would occur inthe final stages of media optimization. Components of the base culturemedium would be removed, e.g, one or more components at a time, and theproperty of interest assayed. An experimental design such as afractional factorial design may be used to assess the contribution ofthe particular component(s) to the overall performance of the medium.Those components that have no significant effect on the property ofinterest can be removed or reduced in the final culture mediumformulation. Alternatively, it may be decided that a component does havean effect on the property of interest, but it is omitted or reduced inthe formulation because of other considerations, e.g., cost,contaminants, and the like, ie., the advantages are outweighed by thedisadvantages of maintaining the component at its current level.

[0140] In the screening process, it may further be preferred to screencompounds in both a chemically-undefined culture medium and a morechemically-defined culture medium. This parallel screening need not becarried out concurrently. During the process of reformulating the mediumused to culture a particular cell, tissue, organ, etc., it may bepreferable to avoid changes to the cellular population in response tothe new medium (i.e., subcloning the population). For example, in theprocess of moving cultured cells from a complex culture mediumcontaining protein hydrolysates to a more defined medium in whichpeptides are substituted for the protein hydrolysate, changes may beseen in the population of -cultured cells. Those skilled in the art willappreciate that it may be desirable that the cultured cells retain theability to grow in the complex medium (with or without the compound).The parallel screening process described above will assist inmaintaining the characteristics of cells cultured therein. In addition,the maintenance of cell viability and growth in undefined and definedmedia provides an opportunity to identify compounds that enhanceperformance in both types of media.

[0141] The terms “chemically defined” and “chemically undefined” culturemedia are used herein according to their commonly accepted meanings inthe art. In general, a “chemically defined” culture medium is a mediumformulation in which essentially all of the components therein are knownand are present in known concentrations. Alternatively stated, a“chemically defined” culture medium is one in which essentially all ofthe components can be described in terms of their chemical formulas andare present in known concentrations. A “chemically undefined” culturemedium is one in which the identity and/or concentration of some mediumcomponent is unknown and only proportionate values such as total aminonitrogen are obtainable. Thus, any medium containing an undefinedcomponent also becomes undefined. The term “semi-defined” is typicallyused in the art to describe a medium that contains only a small amountof undefined material. Examples of undefined components commonly used inmedia are yeast extract and fetal calf serum.

[0142] In addition, the cells may be “conditioned” or “adapted” prior tothe compound screening process by cycling the cells at least oncethrough their current growth medium and the base medium that will beused for the screening process. Typically, the current growth medium isan undefined or semi-defined medium, while the base medium for screeningis chemically defined. This conditioning/adaptation process willincrease the likelihood that cells will grow in both their former mediumas well as in the new base medium. Conditioning/adapting a cell line canenhance the reproducibility of the growth assay and therebyincrease-assay resolution. In addition, using cells that can grow inboth chemically defined and undefined media provides the opportunity toidentify medium components for types of media.

[0143] As a further optional step, the cells may undergo one ortwo-periods of incubation in base medium alone (i.e., lacking the testcompound(s)) prior to being exposed to the test compound(s). Passing thecells through one or more incubation periods in base medium has atwo-fold effect. First, it prevents carry-over of undefined componentsfrom the previous culture medium, which carry-over may skew thescreening results. Second, the incubation period allows the cell toadapt to the new basal medium, so that the measured results should bereflective of the individual test compounds.

[0144] It is not necessary that the inventive methods described hereinresult in a fully-defined culture medium. The final medium formulationmay contain serum, protein hydrolysate, or other undefined mediumcomponents.

[0145] Moreover, the inventive screening methods described herein may becarried out with a base medium from which undefined protein componentsare absent. Alternatively, the base medium may contain an undefinedprotein component(s), as long as the effects of lead compounds are notmasked by the presence of the undefined protein component In oneparticular embodiment in which a medium for culturing bacteria is beingformulated, the base medium contains less than about 10% (w/v) (e.g.,from about 0.0001% to about 10% (w/v)), preferably about 0.1% (w/v) toabout 2.5% (w/v), more preferably about 0.25% (w/v) to about 1% (w/v),of an undefined protein component. In another particular embodiment inwhich a medium for culturing animal (e.g., mammalian) cells is beingformulated, the base medium used in the screening process containsserum. Preferably the serum concentration is from less than about 30%(v/v), more preferably less than about 20% (v/v), in the base culturemedium. Alternatively, the serum concentration is preferably from about0.05% to about 30% (v/v), more preferably 1% (v/v) to about 30% (v/v),still more preferably about 5% (v/v) to about 20% (v/v) in the baseculture medium.

[0146] Exemplary undefined protein components include but are notlimited to hydrolysates (e.g., produced by chemical cleavage such as acasamino acid), digests (e.g., enzymatic digests, such as tryptone,proteose peptone, and the like), extracts (e.g., yeast extract) andinfusions (e.g., organ or tissue infusions, such as brain-heartinfusions), as those terms are understood in the art. The startingmaterial for the undefined protein component is typically yeast,slaughterhouse offal, milk proteins or other proteins (e.g., gelatin),tissues, or organs. Exemplary sources of sera include but are notlimited to fetal calf serum, horse serum, and the like.

[0147] Another preferred embodiment of the present invention is anapparatus for identifying a compound(s) as a component for culturemedium based on at least one parameter(s) of the compound(s). Apreferred apparatus 500 is shown in FIG. 5. This preferred apparatuscomprises means 502, such as the aforementioned measurement tool, formeasuring indicia of the property of interest from a plurality ofculture media each containing a test compound(s). As described above,any measurement tool known in the art may be used to take measurementsas described above, e.g., a spectrophotometer for absorption orcolorimetric measurements, a fluorometer or flow cytometer forfluorescence measurements, a scintillation or gamma counter forradioactive measurements, and an automated cell counter, automated platecounter, or manual plate counter for cell number measurements. As afurther example, a microwell reader can be used for fluorescence,absorbance or calorimetric measurements. In the preferred apparatus 500,the measurement tool is a spectrophotometer, more preferably, amicrowell reader for measuring optical density The apparatus 500 alsooperates under computer control. In particular, the measurement tool 502is preferably operatively coupled to a general purpose or applicationspecific computer controller 504. The controller 504 preferablycomprises a computer program product(s) for controlling operation of themeasurement tool 502 and performing numerical operations relating to theabove-described steps. The controller 504 may accept set-up and otherrelated data via a file 506, disk input 508, or data bus 510. A display512 and printer 514 are also preferably provided to visually display theoperations performed by the controller 504.

[0148] It will be understood by those having skill in the art that thefunctions performed by the controller 504 may be realized in whole or inpart as software modules running on a general purpose computer system.Alternatively, a dedicated stand-alone system with application-specificintegrated circuits for performing the above-described functions andoperations may be provided. In particular, a preferred computer programproduct will comprise a computer readable storage medium havingcomputer-readable program code means embodied in the medium. Thepreferred computer-readable program code means comprisescomputer-readable program code means for performing the operationsdescribed with respect to FIGS. 1 and 3 and throughout the presentdescription.

[0149] Turning to FIG. 6, another aspect of the present invention is amethod of defining a compound library based on a whole moleculedescriptor(s) of the test compounds, Block 600. The compound library ispreferably representative of the larger compound space from which it isderived. Thus, this embodiment of the invention provides a method ofexploring a relatively large compound space more efficiently and mayfurthermore obviate computational, time, cost or other restraints. Thisaspect of the invention can be used with any compound space, inparticular, peptide, protein, carbohydrate, nucleic acid, and lipid(e.g., free fatty acids, triglycerols, steroids) compound spaces.Peptide spaces are preferred.

[0150] According to the present invention, operations are performed toreduce (i.e., contract) the compound space by classifying all testcompounds in the space as a single candidate compound based on a sharedglobal parameter(s) (i.e., a whole molecule parameter), Block 602 Thegroup of compounds sharing common global characteristics are termed“compound isomers”. For example, all peptides with the same amino acidcomposition (i.e., have the same molecular weight) can be grouped as arespective candidate compound. As a further alternative, all peptideshaving the same chemical formula can be grouped as a single candidatecompound (will include peptides with different amino acid sequences,e.g., SVVVV and GIILS, C₂₃H₄₃N₅O₇).

[0151] In the case of a peptide space, the present investigations havefound that a number of parameters may be determined that are independentof peptide sequence. These include, but are not limited to, totalcharge, molecular weight, logP (i.e., hydrophobicity), and total dipole.In such instances, the order of the amino acids in the sequence isirrelevant, and multiple peptide sequences may be represented by acommon sequence. For example, the three peptides AKA, AAK, and KMcontain the same amino acids in the same relative proportions.Consequently, all have the same molecular weight, total charge, etc.,even though their sequences are different. Therefore, all three peptidesmay be represented by a single sequence, such as AAK.

[0152] Thus, while AKA, AAK, and KAA are unique structures, their“global” properties may be sufficient for modeling activity.Consequently, all three sequences would yield the same result (e.g.,biological activity). Only one peptide may be needed to provide thedesired results, and two would be redundant. As a result, fewer peptidesequences would be needed to describe the property space of all peptidesof a given length.

[0153] In one particular embodiment, as shown in FIG. 6, operations areperformed to select less than all of the candidate compounds based on aspace-filling design and to re-expand the selected candidate compoundsinto their constitutive compound isomers, Block 604. Operations tore-expand the compounds function to re-introduce isomer-specificparameters (i.e., sequence-specific parameters). Operations areperformed to select the test library from the re-expanded set ofcompounds using any method known in the art, Block 606. In particularembodiments, at least one (preferably one) compound is selected fromeach of the expanded groups of compound isomers.

[0154] Preferably, the selected test library is representative of thelarger compound space, and, likewise, it is preferred that the resultsachieved by evaluation of the test library are representative of thosethat would be obtained if the entire compound space was evaluated.Compound libraries generated according to this method can be used forany purpose known in the art, e.g., drug design, media formulation, andthe like.

[0155] The foregoing aspect of the invention can be illustrated usingthe following simplified example with model data. The present inventionmay be used to evaluate a peptide space containing all possible peptidetetramers containing the amino acids A and/or C. There are sixteenpossible tetramers if only these two amino acids are utilized (Table 10,column 1). The peptide space containing these sixteen compounds may becontracted by grouping those peptides sharing the same chemical formula(i.e., termed “compound isomers”) as five candidate compounds (Table 10,column 2). All peptides with this same whole molecule characteristic(e.g., chemical formula) are treated as a single peptide (i.e.,candidate compound) with the same properties. Using any method known inthe art (e.g., a space-filling design), two of the five candidatecompounds may be selected (Table 10, column 3). The two selectedcandidate compounds may then be re-expanded into the ten individualcompound isomers based on their sequence (Table 10, column 4). In thefinal step, one of the. individual compound isomers is selected fromeach group (for a total of two peptides) to form the peptide library(Table 10, column 5). TABLE 10 SEQ ID NO: AAAA → AAAA 32 SEQ ID NO: AAAC→ AAAC → AAAC → AACA 33 SEQ ID NO: AACA AACA 34 SEQ ID NO: ACAA ACAA →ACAA 35 SEQ ID NO: CAAA CAAA 36 SEQ ID NO: AACC → AACC → AACC → CCAA 37SEQ ID NO: ACAC ACAC 38 SEQ ID NO: CAAC CAAC → CAAC 39 SEQ ID NO: ACCAACCA 40 SEQ ID NO. CACA CACA 41 SEQ ID NO: CCAA CCAA 42 SEQ ID NO: ACCC→ ACCC 43 SEQ ID NO: CACC 44 SEQ ID NO: CCAC 45 SEQ ID NO: CCCA 46 SEQID NO: CCCC → CCCC 47

[0156] In this simplified example, the number of peptides has beenreduced from sixteen to two. However, in a more realistic and complexset of circumstances the reduction in the compound space will besignificantly greater. Table 11 demonstrates the number of uniquepeptide sequences possible with increasing peptide length. There is anexponential increase in the number of different peptide sequencespossible from the twenty naturally-occurring amino acids with anincrease in the size of the peptide. Given X amino acids and a total ofY residues in a peptide sequence, the total number of possible sequencescan be calculated as X^(Y). The total number of peptide sequences fordipeptides through heptapeptides using 20 amino acids are listed inTable 11. TABLE 11 Number of Total Number Residues of Peptides 2     4003     8000 4   160,000 5 3,200,000 6 64,000,000  7 1,280,000,000   

[0157] For example, 8000 different peptide trimers (i.e., sequences ofthree amino acids) are possible containing the twenty amino acids. Thisnumber increases to 64,000,000 for hexapeptides and 1,280,000,000 forheptapeptides Accordingly, it can be seen that with increasingcomplexity in compound size and composition, the total compound spacecan become too vast for efficient exploration thereof.

[0158] Given limitations in computational power and disk space, thetotal number of possible pentamer and longer peptides may be impracticalfor calculation of sequence-dependent physical parameters. This issignificant when the property of interest is dependent on the sequenceof the amino acids in the peptide. In such a case, the peptide may beviewed as a series of ordered building blocks. This is the commonapproach in peptide drug design, where the peptide fits into a Highlydefined receptor.

[0159] Table 12 demonstrates the reduction in the compound spaceachieved by grouping peptides having a common chemical formula intocompound isomers. Although this table was derived empirically, laterexamination revealed it to be Pascal's triangle (Kotz and Johnson,Eds-in-Chief (1985) Encyclopedia of Statistical Sciences, Vol. 6, pp.628-630, John Wiley & Sons, New York). As one example, the number ofpeptide heptamers possible from different combinations of the twentyamino acids is reduced to 657,800 by grouping peptides into isomers, ascompared with 1,280,000,000 (see Table 11) without grouping of peptideisomers. TABLE 12 Residues 1 2 3 4 5 6 7 1 1 1 1 1 1 1 1 2 2 3 4 5 6 7 83 3 6 10 15 21 28 36 4 4 10 20 35 56 84 120 5 5 15 35 70 126 210 330 6 621 56 126 252 462 792 7 7 28 84 210 462 924 1716 8 8 36 120 330 792 17163432 9 9 45 165 495 1287 3003 6435 10 10 55 220 715 2002 5005 11440 1111 66 286 1001 3003 8008 19448 12 12 78 364 1365 4368 12376 31824 13 1391 455 1820 6188 18564 50388 14 14 105 560 2380 8568 27132 77520 15 15120 680 3060 11628 38760 116280 16 16 136 816 3876 15504 54264 170544 1717 153 969 4845 20349 74613 245157 18 18 171 1140 5985 26334 100947346104 19 19 190 1330 7315 33649 134596 480700 20 20 210 1540 8855 42504177100 657800

[0160] Turning to FIG. 7, the foregoing method of selecting a compoundlibrary cab be used to identify a compound as a culture media componentbased on the parameters of the compound as described above, Block 700. Adecision is made as to whether all compounds in the compound space needto be screened, Block 702. If it is determined that less than all of thecompounds will be tested, operations are preformed to reduce thecompound space by representing each group of compound isomers within thecompound space as a respective candidate compound, Block 704, and tothen select less than all of the candidate compounds, Block 706. Thedecision to test less than all of the compounds in the compound spacemay be based on any criterion or set of criteria, e.g., costconsiderations, time considerations, computational limitations, thenature of the particular property of the compounds being evaluated, etc.

[0161] The selected candidate compounds can be screened directly ifsequence-specific parameters of the compounds are not deemed importantto the selection process. Thus, a decision must be made as to whethersequence-specific parameters should be considered in the screeningstrategy, Block 708. If sequence-specific parameters are to beconsidered, then the compounds are expanded into the constituentcompound isomers by re-introduction of sequence-specific parameters,Block 710. At this point, all of the expanded compound isomers can bescreened, or, alternatively less than all of the expanded compoundisomers are selected, Blocks 712 and 714. Again, the decision to testall or less than all of the expanded compound isomers can be made on anybasis. Finally, at whatever point in the decision process describedabove it is decided that a suitable set of compounds for screening hasbeen identified, indicia of a selected property(s) is measured for aplurality of culture media, each containing a respective test compound,Block 716. Operations to identify a compound for use as a culture mediumcomponent can be carried out as described above, e.g., in FIG. 1 andFIG. 3.

[0162] Turning to FIG. 8, as another preferred embodiment, the inventionprovides a method of predicting the activity (e.g., biological activity)of a peptide based on at least one whole molecule parameter of thepeptide (as defined above), Block 800. Those skilled in the art willappreciate sequence-specific parameters may also be considered inaddition to the at least one whole molecule parameter. As used herein, a“biological activity” includes pharmacological and biochemicalactivities. This method can be used to identify and/or design peptideswith particular activities for use as therapeutic drugs (e.g., formedical or veterinary uses), in developing culture media components (asdescribed in more detail above), for identifying and/or designingpeptides that interact with a target molecule (e.g., receptor agonistsor antagonists) or cell, and in identifying and/or designing peptidesthat induce or enhance, alternatively, prevent or inhibit, any activityof a target protein (e.g., a receptor, enzyme, signaling protein,cell-surface protein, nucleoprotein, ribosomal protein, and the like),cell, or nucleic acid (e.g., DNA, rRNA, mRNA, tRNA).

[0163] Accordingly, it will be apparent to one of skill in the art thatthis embodiment of the invention can be performed with cultured cells,tissues or organs. Alternatively, this embodiment can also be carriedout in cell-free systems (e.g., lysed cells, cell fractions, orbiochemically-defined systems such as purified enzymes or receptors).

[0164] According to this embodiment of the invention, indicia of abiological activity of interest of a plurality of test peptides from afirst test library are measured, Block 802. Peptide libraries are asdescribed hereinabove. The peptides are chosen from the first testlibrary based on a space-filling design, as described in more detailhereinabove. Moreover, the first test peptide may be identified by firstreducing a larger peptide space by grouping all of the peptides thereinaccording to at least one whole molecule parameter, e.g., as describedabove and in FIG. 6.

[0165] Indicia of the biological activity may be measured using anysuitable method known in the art, as discussed hereinabove with respectto Block 102.

[0166] A relationship (e.g., a mathematical relationship) is determinedbetween the measured indicia of the biological activity and at least onewhole molecule parameter (e.g., descriptor) of the plurality of firsttest peptides, Block 804. Operations to determine the relationship arecarried out by any means known in the art, preferably, as describedabove in Blocks 104 and 306. In determining the relationship,sequence-specific parameters (i.e not whole molecule parameters) mayalso be considered. Any whole molecule or sequence-specific parameterknown in the art for describing peptides may be used to determine therelationship. Whole molecule parameters include, but are not limited to,total charge, molecular weight, isoelectric point, total dipole moment,isotropic surface area, electronic charge, and hydrophobicity. Inparticular embodiments, at least two whole molecule parameters areemployed to describe each of the test peptides. In an alternatepreferred embodiment, the peptides are described using at least twowhole molecule parameters, where the first parameter is molecular weightand the second is total charge, isoelectric point, total dipole moment,isotropic surface area, electronic charge index, or hydrophobicity.

[0167] Referring again to FIG. 8, a test requirement related to themeasured indicia of the biological activity is determined, Block 806.Operations to determine the test requirement are performed essentiallyas described for Blocks 106 and 304. The biological activity of a secondpeptide or plurality of peptides not within the first set of testcompounds can be identified based on the relationship between thebiological activity and the whole molecule parameter(s) of the secondpeptide(s), so as to identify a peptide(s) that is expected to provideindicia of the biological activity that satisfies the test requirement.

[0168] In the drawings and specification, there have been disclosedtypical preferred embodiments of the invention and, although specificterms are employed, they are used in a generic and descriptive senseonly and not for purposes of limitation, the scope of the inventionbeing set forth in the following claims. Moreover, the terminology inthe present description relating to graphs, plotting lines, determininga relationship, determining a “best fit” line, representing compoundisomers as a candidate compound, expanding candidate compounds intotheir constituent compound isomers, etc is intended to include theprocessing of data and parameters internal to a processing unit (e g, acomputer) containing memory and not limited to the physical acts ofprinting or plotting lines, curves and graphs.

1 47 1 4 PRT Artificial Sequence Description of Artificial Sequencehypothetical peptide 1 Gly Ala Leu Gly 1 2 4 PRT Artificial SequenceDescription of Artificial Sequence hypothetical peptide 2 Gln Gly ValGlu 1 3 4 PRT Artificial Sequence Description of Artificial Sequencehypothetical peptide 3 Ser Ala Pro Val 1 4 4 PRT Artificial SequenceDescription of Artificial Sequence hypothetical peptide 4 Ser Pro AlaGln 1 5 4 PRT Artificial Sequence Description of Artificial Sequencehypothetical peptide 5 Glu Glu Val Phe 1 6 4 PRT Artificial SequenceDescription of Artificial Sequence hypothetical peptide 6 Val Leu SerLys 1 7 4 PRT Artificial Sequence Description of Artificial Sequencehypothetical peptide 7 Val Ser Glu Leu 1 8 4 PRT Artificial SequenceDescription of Artificial Sequence hypothetical peptide 8 Pro Phe GluPro 1 9 4 PRT Artificial Sequence Description of Artificial Sequencehypothetical peptide 9 Glu Leu Gln Glu 1 10 4 PRT Artificial SequenceDescription of Artificial Sequence hypothetical peptide 10 Lys Val GlnPhe 1 11 4 PRT Artificial Sequence Description of Artificial Sequencehypothetical peptide 11 Gly Lys Ala Pro 1 12 4 PRT Artificial SequenceDescription of Artificial Sequence hypothetical peptide 12 Ala Gln LysSer 1 13 4 PRT Artificial Sequence Description of Artificial Sequencehypothetical peptide 13 Ala Gln Gly Glu 1 14 4 PRT Artificial SequenceDescription of Artificial Sequence hypothetical peptide 14 Lys Glu PheGly 1 15 4 PRT Artificial Sequence Description of Artificial Sequencehypothetical peptide 15 Pro Ser Phe Lys 1 16 4 PRT Artificial SequenceDescription of Artificial Sequence hypothetical peptide 16 Phe Ser LeuAla 1 17 4 PRT Artificial Sequence Description of Artificial Sequencehypothetical peptide 17 Leu Phe Gly Ala 1 18 4 PRT Artificial SequenceDescription of Artificial Sequence hypothetical peptide 18 Glu Val LysSer 1 19 4 PRT Artificial Sequence Description of Artificial Sequencehypothetical peptide 19 Val Gly Glu Ala 1 20 4 PRT Artificial SequenceDescription of Artificial Sequence hypothetical peptide 20 Gln Glu SerGln 1 21 4 PRT Artificial Sequence Description of Artificial Sequencehypothetical peptide 21 Gly Ala Pro Val 1 22 4 PRT Artificial SequenceDescription of Artificial Sequence hypothetical peptide 22 Ser Ala LeuGly 1 23 4 PRT Artificial Sequence Description of Artificial Sequencehypothetical peptide 23 Asp Lys Ala His 1 24 4 PRT Artificial SequenceDescription of Artificial Sequence hypothetical peptide 24 Asp Trp ProAla 1 25 4 PRT Artificial Sequence Description of Artificial Sequencehypothetical peptide 25 Glu Ser Met His 1 26 4 PRT Artificial SequenceDescription of Artificial Sequence hypothetical peptide 26 Gly Val AsnGlu 1 27 4 PRT Artificial Sequence Description of Artificial Sequencehypothetical peptide 27 His Glu Asp Val 1 28 4 PRT Artificial SequenceDescription of Artificial Sequence hypothetical peptide 28 Glu Thr GlySer 1 29 4 PRT Artificial Sequence Description of Artificial Sequencehypothetical peptide 29 His Tyr Gly Val 1 30 4 PRT Artificial SequenceDescription of Artificial Sequence hypothetical peptide 30 Asp Phe GlyVal 1 31 4 PRT Artificial Sequence Description of Artificial Sequencehypothetical peptide 31 His Tyr Pro Val 1 32 4 PRT Artificial SequenceDescription of Artificial Sequence hypothetical peptide 32 Ala Ala AlaAla 1 33 4 PRT Artificial Sequence Description of Artificial Sequencehypothetical peptide 33 Ala Ala Ala Cys 1 34 4 PRT Artificial SequenceDescription of Artificial Sequence hypothetical peptide 34 Ala Ala CysAla 1 35 4 PRT Artificial Sequence Description of Artificial Sequencehypothetical peptide 35 Ala Cys Ala Ala 1 36 4 PRT Artificial SequenceDescription of Artificial Sequence hypothetical peptide 36 Cys Ala AlaAla 1 37 4 PRT Artificial Sequence Description of Artificial Sequencehypothetical peptide 37 Ala Ala Cys Cys 1 38 4 PRT Artificial SequenceDescription of Artificial Sequence hypothetical peptide 38 Ala Cys AlaCys 1 39 4 PRT Artificial Sequence Description of Artificial Sequencehypothetical peptide 39 Cys Ala Ala Cys 1 40 4 PRT Artificial SequenceDescription of Artificial Sequence hypothetical peptide 40 Ala Cys CysAla 1 41 4 PRT Artificial Sequence Description of Artificial Sequencehypothetical peptide 41 Cys Ala Cys Ala 1 42 4 PRT Artificial SequenceDescription of Artificial Sequence hypothetical peptide 42 Cys Cys AlaAla 1 43 4 PRT Artificial Sequence Description of Artificial Sequencehypothetical peptide 43 Ala Cys Cys Cys 1 44 4 PRT Artificial SequenceDescription of Artificial Sequence hypothetical peptide 44 Cys Ala CysCys 1 45 4 PRT Artificial Sequence Description of Artificial Sequencehypothetical peptide 45 Cys Cys Ala Cys 1 46 4 PRT Artificial SequenceDescription of Artificial Sequence hypothetical peptide 46 Cys Cys CysAla 1 47 4 PRT Artificial Sequence Description of Artificial Sequencehypothetical peptide 47 Cys Cys Cys Cys 1

What is claimed is:
 1. A method of identifying a culture mediumcomponent, comprising the steps of: measuring first indicia of aproperty of a plurality of first culture media which each contains arespective first test compound from within a first test library,determining a relationship between at least one parameter of the firsttest compounds and the measured first indicia of the property of theplurality of first culture media; determining a test requirementrelating to the measured first indicia; and identifying a second testlibrary containing a plurality of second test compounds as components ofa plurality of second culture media which based on the relationship areexpected to provide second indicia of the property which meets the testrequirement.
 2. The method of claim 1, wherein the plurality of firsttest compounds is selected from the first test library using aspace-filling technique; and wherein the plurality of second testcompounds is selected from the second test library using a space-fillingtechnique.
 3. The method of claim 1, wherein said step of determining arelationship comprises the step of determining ŷ=f(x_(ij)), where x_(ij)denotes a parameter, i ranges from 1 to n where n represents the numberof first culture media in the plurality thereof, j ranges from 1 to dwhere d represents the number of parameters, and ŷ_(i) represents anestimate of the measured first indicia of the property of the pluralityof first culture media.
 4. The method of claim 3, wherein said step ofdetermining a test requirement comprises the step of determining a rangeof acceptable indicia of the property.
 5. The method of claim 4, whereinsaid identifying step comprises determining from ŷ_(i)=f(x_(ij))estimated indicia of the property of a plurality of second culture mediawhich each contains a respective test compound; and wherein at least oneof the second culture media contains a test compound that is not withinthe first test library.
 6. The method of claim 5, wherein saididentifying step further comprises: determining which of the estimatedindicia are within the range of acceptable indicia; and determining fromthe estimated indicia that are within the range, the plurality of secondtest compounds from the second test library.
 7. The method of claim 6,further comprising the step of measuring second indicia of the propertyof the plurality of second culture media.
 8. The method of claim 7,wherein f(x_(ij)) is a non-parametric regression formula.
 9. The methodof claim 3, wherein f(x_(ij)) is a non-parametric regression formula.10. The method of claim 1, wherein said step of determining arelationship comprises the step of: determining a distance functiond(x₁, x₂) between a first value of a parameter, x₁, of a first testcompound and a second value of the parameter, x₂, of a second testcompound not within the plurality of first test compounds; andestimating indicia of the property of a culture medium containing thesecond test compound as the indicia of the property of the culturemedium containing the first test compound if d(x₁, x₂)≦d_(cutoff1),where d_(cutoff1) is a cutoff distance for the first test compound. 11.The method of claim 1, wherein said measuring step is preceded by a stepof defining a first test library by representing each of a plurality ofgroups of compound isomers from a compound space as a respectivecandidate compound.
 12. The method of claim 11, further comprising thestep of expanding less than all of the candidate compounds determined insaid representing step into their constituent compound isomers using aspace-filling technique. 13 The method of claim 1, wherein the at leastone parameter is selected from the group consisting of whole molecule,sequence-specific, and topological parameters.
 14. The method of claim1, wherein the at least one parameter is a whole molecule parameter. 15.The method of claim 14, wherein the whole molecule parameter is selectedfrom the group consisting of total charge, molecular weight, isoelectricpoint, total dipole moment, isotropic surface area, electronic chargeindex, and hydrophobicity.
 16. A culture medium component identified bythe method of claim
 1. 17. A culture medium comprising the culturemedium component of claim
 16. 18. The method of claim 1, wherein thefirst and second test libraries are selected from the group consistingof peptide, polynucleotide, nucleic acid, carbohydrate, free fatty acid,and lipid libraries.
 19. The method of claim 1, wherein the first andsecond test libraries are test peptide libraries.
 20. The method ofclaim 19, wherein the first and second test peptide libraries consist ofpeptides having a length in a range from about four amino acids to abouttwenty amino acids.
 21. The method of claim 19, wherein the peptides inthe first test library comprise at least one amino acid position that isnonvariable or is designated by a limited number of possible aminoacids.
 22. The method of claim 19, wherein the peptides in the secondtest library comprise at least one amino acid position that isnonvariable or is designated by a limited number of possible aminoacids.
 23. The method of claim 1, wherein said step of measuring firstindicia is preceded by the step of forming a plurality of cell cultureswhich each contains a respective first culture medium from within theplurality thereof. 24 The method of claim 23 further comprising the stepof conditioning the cell cultures to grow in both chemically-undefinedand chemically-defined media prior to said measuring step.
 25. Themethod of claim 23, wherein the cell cultures are selected from thegroup consisting of mammalian, insect, plant, fungal, yeast, protozoanand bacterial cell cultures.
 26. The method of claim 23, wherein theplurality of culture media are chemically-defined culture media.
 27. Themethod of claim 23, wherein the measured property of the plurality offirst culture media is the ability to alter the growth, proliferation,maturation or differentiation of cultured cells.
 28. The method of claim23, wherein the measured property of the plurality of first culturemedia is the ability to alter peptide or protein production by culturedcells.
 29. The method of claim 28, wherein the peptide or protein isselected from the group consisting of antigens, toxins, antibodies,hormones, growth factors, cytokines, clotting factors, and enzymes. 30.The method of claim 23, wherein the measured property of the pluralityof first culture media is the ability to alter the production of acompound selected from the group consisting of antibiotics, steroids,carbohydrates lipids and nucleic acids by cultured cells.
 31. A methodof defining a test compound library, comprising the step of representingeach of a plurality of groups of compound isomers from within a compoundspace as a respective candidate compound.
 32. The method of claim 31,wherein said representing step is followed by the steps of: selectingless than all of the candidate compounds using a space-fillingtechnique; and expanding the less than all of the candidate compoundsdetermined in said selecting step into their constituent compoundisomers.
 33. The method of claim 32, further comprising the step ofselecting at least one constituent compound isomer for each of thecandidate compounds.
 34. The method of claim 31, wherein the compoundspace and test compound library consist of compounds selected from thegroup consisting of peptides, polynucleotides, nucleic acids,carbohydrates, free fatty acids, and lipids.
 35. The method of claim 31,wherein the compound space is a peptide space and the test compoundlibrary is a test peptide library.
 36. A test compound library whereineach of a plurality of compound isomers from within a compound space isrepresented as a respective candidate compound.
 37. The test compoundlibrary of claim 36, wherein the compound space is a peptide space andthe test compound library is a test peptide library.
 38. A test compoundlibrary formed by the method of claim
 31. 39. A test compound libraryformed by the method of claim
 32. 40. The test compound library of claim39, wherein the compound space is a peptide space and the test compoundlibrary is a test peptide library
 41. A method of Identifying a culturemedium component, comprising the steps of: defining a first test libraryby representing each of a plurality of groups of compound isomers fromwithin a compound space as a respective candidate compound; measuringfirst indicia of a property of a plurality of first culture media whicheach contains a respective first test compound from within the firsttest library; determining a relationship between at least one parameterof the first test compounds and the measured first indicia of theproperty of the plurality of first culture media; determining a testrequirement relating to the measured first indicia; and identifying aplurality of second test compounds from within the compound space ascomponents of a plurality of second culture media which based on therelationship are expected to provide second indicia of the propertywhich meets the test requirement.
 42. The method of claim 41, whereinsaid defining step further comprises expanding the less than all of thecandidate compounds determined in said representing step into theirconstituent compound isomers using a space-filling technique.
 43. Themethod of claim 42, wherein said defining step further comprisesselecting at least one constituent compound isomer from each of thecandidate compounds.
 44. The method of claim 41, wherein the pluralityof first test compounds is selected from the first test library using aspace-filling technique
 45. The method of claim 41, wherein said step ofdetermining a relationship comprises the step of determiningŷ_(i)=f(x_(ij)), where x_(ij) denotes a parameter, i ranges from 1 to nwhere n represents the number of first culture media in the pluralitythereof, j ranges from 1 to d where d represents the number ofparameters, and ŷ, represents an estimate of the measured first indiciaof the property of the plurality of first culture media.
 46. The methodof claim 45, wherein said step of determining a test requirementcomprises the step of determining a range of acceptable indicia of theproperty.
 47. The method of claim 46, wherein said identifying stepcomprises determining from ŷ=f(x_(ij)) estimated indicia of the propertyof a plurality of second culture media which each contains a respectivetest compound, wherein at least one of the second culture media containsa test compound that is not within the first test library.
 48. Themethod of claim 47, wherein said identifying step further comprises:determining which of the estimated indicia are within the range ofacceptable indicia; and determining from the estimated indicia that arewithin the range, the plurality of second test compounds from thecompound space.
 49. The method of claim 45, wherein f(x_(ij)) is anon-parametric regression formula.
 50. The method of claim 41, whereinsaid step of determining a relationship comprises the step of:determining a distance function d(x₁, x₂) between a first value of aparameter, x₁, of a first test compound and a second value of theparameter, x₂, of a second test compound not within the plurality offirst test compounds; and estimating indicia of the property of aculture medium containing the second test compound as the indicia of theproperty of the culture medium containing the first test compound ifd(x₁, x₂)≦d_(cutoff1), where d_(cutoff1) is a cutoff distance for thefirst test compound.
 51. The method of claim 41, wherein the at leastone parameter is selected from the group consisting of whole molecule,sequence-specific, and topological parameters.
 52. The method of claim41, wherein the at least one parameter is a whole molecule parameter.53. The method of claim 41, wherein the compound space is selected fromthe group consisting of peptide, polynucleotide, nucleic acid,carbohydrate, free fatty acid, and lipid spaces.
 54. The method of claim41, wherein the compound space is a peptide space.
 55. A culture mediumcomponent identified by the method of claim
 41. 56. A culture mediumcomprising the culture medium component of claim
 55. 57. The method ofclaim 41, wherein said step of measuring first indicia of the propertycomprises adding each of the plurality of first culture media to arespective cell culture to form a plurality of cell cultures eachcontaining a respective culture medium containing a respective firsttest compound.
 58. The method of claim 57, wherein the plurality of cellcultures is selected from the group consisting of mammalian, insect,plant, fungal, yeast, protozoan and bacterial cell cultures.
 59. Amethod of predicting indicia of a property of a peptide, comprising thesteps of: measuring indicia of an activity of a plurality of testpeptides from a test peptide library; determining a relationship betweenthe measured indicia of the activity and at least one whole moleculeparameter of the plurality of test peptides; predicting the indicia ofthe activity of a peptide not within the plurality of test peptidesbased on the relationship.
 60. The method of claim 59, wherein the firstplurality of test peptides from the test peptide library is selectedusing a space-filling technique.
 61. The method of claim 59, wherein theat least one whole molecule parameter comprises a parameter selectedfrom the group consisting of total charge, molecular weight, isoelectricpoint, total dipole moment, isotropic surface area, electronic chargeindex, and hydrophobicity.
 62. The method of claim 59, wherein at leasttwo whole molecule parameters of the plurality of test peptides areselected from the group consisting of total charge, molecularweight,isoelectric point, total dipole moment, isotropic surface area,electronic charge index, and hydrophobicity.
 63. The method of claim 59,wherein the at least one whole molecule parameter compriseshydrophobicity, molecular weight, total dipole moment, and total charge.64. The method of claim 59, wherein the at least one whole moleculeparameter comprises molecular weight and at least one additionalparameter selected from the group consisting of total charge,isoelectric point, total dipole moment, isotropic surface area,electronic charge index, and hydrophobicity.
 65. The method of claim 59,wherein the activity is binding to a receptor.
 66. The method of claim59, wherein the activity is enhancement or inducement of a biologicalactivity in a cell.
 67. The method of claim 59, wherein the activity isinhibition or prevention of a biological activity in a cell.
 68. Themethod of claim 66 or claim 67, wherein the cell is a cell cultured invitro.
 69. The method of claim 68, wherein said step of measuringindicia of the property comprises: forming a plurality of culture mediathat each contains a respective test peptide from the plurality thereof;and adding each of the plurality of culture media to a respective cellculture to form a plurality of cell cultures each containing arespective culture medium containing a respective test compound.
 70. Themethod of claim 59, wherein the activity is enhancement or inhibition ofa receptor.
 71. The method of claim 59, wherein the activity isenhancement or inducement of activation of a receptor.
 72. The method ofclaim 59, wherein the test peptide library consists of peptides having alength in a range from about four amino acids to about twenty aminoacids.
 73. The method of claim 59, wherein the test peptide libraryconsists of peptides having a length in a range from about four aminoacids to about ten amino acids.
 74. A method of identifying a peptidewith a predicted indicia of an activity that satisfies a testrequirement, comprising the steps of: measuring first indicia of anactivity of a plurality of first test peptides from a first test peptidelibrary; determining a relationship between the measured first indiciaof the activity and at least one whole molecule parameter of theplurality of first test peptides; determining a test requirementrelating to the measured first indicia; and identifying a second testpeptide library containing a plurality of second test peptides whichbased on the relationship are expected to provide second indicia of theactivity that meets the test requirement.
 75. The method of claim 74,wherein the plurality of first test peptides is selected from the firsttest peptide library using a space-filling technique.
 76. The method ofclaim 74, wherein said step of determining a relationship comprises thestep of determining ŷ_(i)=f(x_(ij)), where x_(ij) denotes a parameter, iranges from 1 to n where n represents the number of first test peptidesin the plurality thereof, j ranges from 1 to d where d represents thenumber of whole molecule parameters, and y, represents an estimate ofthe measured first indicia of the activity of the plurality of firsttest peptides.
 77. The method of claim 76, wherein said step ofdetermining a test requirement comprises the step of determining a rangeof acceptable indicia of the activity.
 78. The method of claim 77,wherein said identifying step further comprises: determining which ofthe estimated indicia are within the range of acceptable indicia; anddetermining from the estimated indicia that are within the range, theplurality of second test peptides from the second test peptide library.79. The method of claim 76, wherein f(x_(ij)) is a non-parametricregression formula.
 80. The method of claim 74, wherein said step ofdetermining a relationship comprises the step of: determining a distancefunction d(x₁, x₂) between a first value of a whole molecule parameter,x₁, of a first test peptide and a second value of the whole moleculeparameter, x₂, of a second test peptide not within the first testpeptide library; and estimating indicia of the activity of the secondtest peptide as the indicia of the activity of the first test peptide ifd(x₁, x₂)≦d_(cutoff1), where d_(cutoff1) is a cutoff distance for thefirst test peptide.
 81. The method of claim 74, wherein said measuringstep is preceded by the step of defining a first test peptide library byrepresenting each of a plurality of groups of peptide isomers from afirst peptide space as a respective candidate peptide.
 82. The method ofclaim 81, further comprising the step of expanding less than all of thecandidate peptides determined in said representing step into theirconstituent compound peptides using a space-filling technique.
 83. Themethod of claim 74, wherein the at least one whole molecule parameter isselected from the group consisting of total charge, molecular weight,isoelectric point, total dipole moment, isotropic surface area,electronic charge index, and hydrophobicity.
 84. The method of claim 74,wherein at least two whole molecule parameters are selected from thegroup consisting of total charge, molecular weight, isoelectric point,total dipole moment, isotropic surface area, electronic charge index,and hydrophobicity.
 85. The method of claim 74, wherein the at least oneparameter comprise hydrophobicity, molecular weight, total dipolemoment, and total charge.
 86. The method of claim 74, wherein the atleast one whole molecule parameter is molecular weight and at least oneadditional parameter selected from the group consisting of total charge,isoelectric point, total dipole moment, isotropic surface area,electronic charge index, and hydrophobicity.
 87. The method of claim 74,wherein the activity is binding to a receptor.
 88. The method of claim74. wherein the activity is enhancement or inducement of a biologicalactivity in a cell.
 89. The method of claim 74, wherein the activity isinhibition or prevention of a biological activity in a cell.
 90. Themethod of claim 88 or claim 89, wherein the cell is a cell cultured invitro.
 91. The method of claim 90, wherein said step of measuring firstindicia of the activity comprises: forming a plurality of culture mediathat each contains a respective test peptide from the plurality thereof;and adding each of the plurality of culture media to a respective cellculture to form a plurality of cell cultures each containing arespective culture medium containing a respective first test compound.92. The method of claim 74, wherein the activity is inhibition orprevention of activation of a receptor.
 93. The method of claim 74,wherein the activity is enhancement or inducement of activation of areceptor.
 94. The method of claim 74, wherein the test peptide libraryconsists of peptides having a length in a range from about four aminoacids to about twenty amino acids.
 95. The method of claim 74, whereinthe test peptide library consists of peptides having a length in a rangefrom about four amino acids to about ten amino acids.
 96. A method ofidentifying a culture medium component, comprising the steps of:culturing a plurality of first cell cultures in a plurality of firstculture media each containing a respective first test compound from afirst test library; measuring first indicia of a property of theplurality of first culture media in the plurality of first cellcultures; determining a relationship between the measured first indiciaof the property and at least one parameter of the plurality of firstculture media; determining a test requirement relating to the measuredfirst indicia; and culturing a plurality of second cell cultures in aplurality of second culture media each containing a respective secondtest compound from a second test library; wherein based on therelationship the plurality of second culture media containing the secondtest compounds are predicted to give indicia of the property thatsatisfy the test requirement.
 97. The method of claim 96, furthercomprising the step of reformulating the culture medium containing theidentified culture medium component to omit components.
 98. The methodof claim 97, wherein at least one component is omitted from the culturemedium formulation.
 99. The method of claim 96, wherein said step ofculturing a plurality of first cell cultures further comprises the stepsof: culturing the plurality of first cell cultures in achemically-defined culture medium and a chemically-undefined culturemedium each containing a test compound from the first test library; andcomparing the measured indicia of the property for the same testcompound in the chemically-defined culture medium and thechemically-undefined culture medium.
 100. The method of claim 96,wherein said measuring step is preceded by the step of conditioning thecell cultures to grow in both chemically-undefined andchemically-defined media.
 101. The method of claim 96, wherein theplurality of first and second culture media comprise a concentration ofan undefined protein component in a range from about 0.1% (w/v) to about2.5% (w/v).
 102. The method of claim 101, wherein the undefined proteincomponent is selected from the group consisting of hydrolysates,digests, extracts, and infusions.
 103. The method of claim 96, whereinthe plurality of first and second culture media comprise a concentrationof an undefined protein component in a range from about 0.25% (w/v) toabout 1% (w/v).
 104. The method of claim 96, wherein the plurality offirst and second culture media comprise a concentration of serum in arange from about 0.05% (v/v) to about 30% (v/v).
 105. The method ofclaim 96, wherein the plurality of first and second cell cultures isselected from the group consisting of mammalian, insect, plant, fungal,yeast, protozoan and bacterial cell cultures.
 106. The method of claim96, wherein the plurality of first and second culture media arechemically-defined culture media.
 107. The method of claim 96, whereinthe plurality of first and second culture media are liquid culturemedia.
 108. The method of claim 96, wherein the measured property of theplurality of first culture media is the ability to alter growth,maturation, proliferation, or differentiation of cultured cells. 109.The method of claim 96, wherein the measured property of the pluralityof first culture media is the ability to alter peptide or proteinsynthesis by cultured cells.
 110. The method of claim 109, wherein thepeptide or protein is selected from the group consisting of antigens,toxins, antibodies, hormones, growth factors, cytokines, clottingfactors, and enzymes.
 111. The method of claim 96, wherein the measuredproperty of the plurality of first culture media is the ability to alterthe synthesis of a compound selected from the group consisting ofantibiotics, steroids, carbohydrates, lipids and nucleic acids bycultured cells.
 112. The method of claim 96, wherein the measuredproperty of the plurality of first culture media is the ability to alterpeptide or protein secretion by cultured cells.
 113. A culture mediumcomponent identified by the method of claim
 96. 114. A culture mediumcomprising the culture medium component of claim
 113. 115. The culturemedium of claim 114, wherein the culture medium comprises aconcentration of an undefined protein component in a range from about0.1% (w/v) to about 2.5% (w/v).
 116. The culture medium of claim 115,wherein the undefined protein component is selected from the groupconsisting of hydrolysates, digests, extracts and infusions.
 117. Theculture medium of claim 114, wherein the culture medium comprises aconcentration of serum in a range from about 0.05% (v/v) to about 30%(v/v).
 118. The culture medium of claim 114, wherein the culture mediumcomprises insulin.
 119. An apparatus for identifying a culture mediumcomponent, comprising: means for determining a relationship betweenmeasured first indicia of a property of a plurality of first culturemedia which each contains a respective first test compound from within afirst test library and at least one parameter of the first testcompounds; and means for identifying a second test library containing aplurality of second test compounds as components of a plurality ofsecond culture media which based on the relationship are expected toprovide second indicia of the property which meets a test requirementrelating to the measured first indicia.
 120. The apparatus of claim 119,wherein said determining means comprises means for determiningŷ_(i)=f(x_(ij)), where x_(ij) denotes a parameter, i ranges from 1 to nwhere n represents the number of first culture media in the pluralitythereof, j ranges from 1 to d where d represents the number ofparameters, and ŷ_(i) represents an estimate of the measured firstindicia of the property of the plurality of first culture media. 121.The apparatus of claim 120, wherein said identifying means comprisesmeans for determining from ŷ_(i)=f(x_(ij)) estimated indicia of theproperty of a plurality of second culture media which each contains arespective test compound, wherein at least one of the second culturemedia contains a test compound that is not within the first test library122. The apparatus of claim 119, wherein said determining meanscomprises: means for determining a distance function d(x₁, x₂) between afirst value of a parameter, x₁, of a first test compound and a secondvalue of a parameter, x₂, of a second test compound not within theplurality of first test compounds; and means for estimating indicia ofthe property of a culture medium containing the second test compound asthe indicia of the property of the culture medium containing the firsttest compound if d(x₁, x₂)≦d_(cutoff1) where d_(cutoff1) is a cutoffdistance for the first test compound.
 123. A computer program productreadable by a machine and tangibly embodying a program of instructionsexecutable by the machine to perform the method steps of: determining arelationship between measured first indicia of a property of a pluralityof first culture media which each contains a respective first testcompound from within a first test library and at least one parameter ofthe first test compounds; and identifying a second test librarycontaining a plurality of second test compounds as components of aplurality of second culture media which based on the relationship areexpected to provide second indicia of the property which meets a testrequirement relating to the measured first indicia.
 124. The computerprogram product of claim 123, wherein said determining step furthercomprises determining ŷ_(i)=f(x_(ij)), where x_(ij) denotes a parameter,i ranges from 1 to n where n represents the number of first culturemedia from within a plurality thereof, j ranges from 1 to d where drepresents the number of parameters, and y, represents an estimate ofthe measured first indicia of the property of the first test compounds.125. The computer program product of claim 124, wherein said identifyingstep further comprises determining from ŷ_(i)=f(x_(ij)) estimatedindicia of the property of a plurality of second culture media whicheach contains a respective test compound, wherein at least one of thesecond culture media contains a test compound that is not within thefirst test library.
 126. The computer program product of claim 125,wherein said determining step further comprises: determining a distancefunction d(x₁, x₂) between a first value of a parameter, x₁, of a firsttest compound and a second value of a parameter, x₂, of a second testcompound not within the plurality of first test compounds; andestimating indicia of the property of a culture medium containing thesecond test compound as the indicia of the property of the culturemedium containing the first test compound if d(x₁, x₂)≦d_(cutoff1),where d_(cutoff1) is a cutoff distance for the first test compound. 127.The method of claim 97, wherein a vitamin is omitted from the culturemedium formulation